CCA - Community Core Audio Preview Release

Introduction
I am proud to announce the first release of CCA, which I have been working on with my mentor Mathieu Virbel. Community Core Audio or CCA is a GSoC 2010 project that shares similar GUI to CCV as well as underlying code base.

The goal of CCA is to manage voice inputs, convert voice to text, and output resulting messages to network. The preview release of CCA (for Windows) is available for download here. We hope to get feedback from the community on this preview and look forward to the future results.

Getting Started
The current version only support command-picking mode. So do not click the "FREE SPEAKING MODE" button in this preview as it may cause application to crash. For detail of these two modes, please read: CCA Modes. Also the current version only support English digits because of the simple sphinx resources.

1) Select the check box "RECORD SOUND" to start recording. The waveform will be showed at the viewer window dynamically.
2) Un-select the check box "RECORD SOUND" or Click the "STOP" button to stop recording.
3) Select the check box "PLAY/PAUSE" to play, unselect it to pause. Click the "STOP" button to stop playing.
4) After recoding a audio, click the "SENT TO RECOGNIZE ENGINE", and the output viewer will display the sentence you just record.
5) You can click the "CLEAR SCREEN" button to clear the output viewer.

Configuration
For normally use, you do not need to do any configuration, what you need is just download and run it. However, CCA provide some options through config files.

The most important config file is $cca_path/data/config.xml. If you want to use new sphinx resources, you must specify the path of new resource files in this XML file. To learn about resource files, please read: Sphinx Resource Files.

The input audio sample rate was also set in config.xml. The input sample rate must be same as the sample rate of the Acoustics Model (AM). AM is a part of the resource files. Also the file $cca_path/data/commandList.txt is for CommandPicking mode. See this document: CCA Modes.

Technical Detail
We developed a stand alone oF addon for speech recognition, ofxASR, which was released several weeks ago. ofxASR is the core engine of CCA, and it can be applied on any oF application. Currently it use CMU Sphinx3 as its Automatic Speech Recognition (ASR) engine, but it also designed to use other ASR engine as well, such as Mac OSX Speech as all engine share the same interface. You can get the source of ofxASR here. Also a class named ofRectPrint was created to print lines of string in a rectangle with auto scroll and scroll up/down.

Coming Soon
- Ship better sphinx resources that support any English words instead of digits.
- The free-speaking mode.
- Output to network.
- OSX and Linux support.

Join the Discussion  |   Getting Started  |   Download Now  |   Get the Source


Responses

This is a very interesting project. The use of audio input in combination with other interaction techniques is still largely unexplored and it would be great to have a system with integrated vision tracking and audio processing.
However, I am very disappointed to see that such an interesting open source project gives its priority to a proprietary platform. How am I suppose to test this? My platform of choice is Linux.

Unable to checkout from either repository.  Would love to build onto this.  Very cool stuff.

Aras… all we can do is give the source open and hope we get developers for such platforms. Until then I am sure Jimbo is working hard on getting this done but keep in mind this is a preview smile

Hi, the developing on OSX and Linux is in progress. Please be patience, the OSX preview version will coming in a week.  smile

Great to hear. I look forward to the release. smile

I have just released a preview version for OS X but it was not very inaccurate.
Please check http://nuicode.com/news/54

Great job Jimbo. I hope that someone will find the issue with OS X soon. Are there any ongoing efforts to create a Linux version as well? Would I be able to test it?

Aras: This version is just a preview and we have man y feature need to develop in the next month. What I can promise is once we have a stable version on win/osx, I will port it to Linux at once.

It will be great if you can port it on Linux, you can check out the code by:
svn checkout http://nuicode.svnrepository.com/svn/cca

The preview release for osx has been updated. It can recognize any english words and sentences, not only digits. Please check: http://nuicode.com/news/55 to learn more and get the download link.

Leave a response

Click here to register an account.

Categories