I am proud to announce the first release of CCA, which I have been working on with my mentor Mathieu Virbel. Community Core Audio or CCA is a GSoC 2010 project that shares similar GUI to CCV as well as underlying code base.

The goal of CCA is to manage voice inputs, convert voice to text, and output resulting messages to network. The preview release of CCA (for Windows) is available for download here. We hope to get feedback from the community on this preview and look forward to the future results.

Getting Started
The current version only support command-picking mode. So do not click the "FREE SPEAKING MODE" button in this preview as it may cause application to crash. For detail of these two modes, please read: CCA Modes. Also the current version only support English digits because of the simple sphinx resources.

1) Select the check box "RECORD SOUND" to start recording. The waveform will be showed at the viewer window dynamically.
2) Un-select the check box "RECORD SOUND" or Click the "STOP" button to stop recording.
3) Select the check box "PLAY/PAUSE" to play, unselect it to pause. Click the "STOP" button to stop playing.
4) After recoding a audio, click the "SENT TO RECOGNIZE ENGINE", and the output viewer will display the sentence you just record.
5) You can click the "CLEAR SCREEN" button to clear the output viewer.

For normally use, you do not need to do any configuration, what you need is just download and run it. However, CCA provide some options through config files.

The most important config file is $cca_path/data/config.xml. If you want to use new sphinx resources, you must specify the path of new resource files in this XML file. To learn about resource files, please read: Sphinx Resource Files.

The input audio sample rate was also set in config.xml. The input sample rate must be same as the sample rate of the Acoustics Model (AM). AM is a part of the resource files. Also the file $cca_path/data/commandList.txt is for CommandPicking mode. See this document: CCA Modes.

Technical Detail
We developed a stand alone oF addon for speech recognition, ofxASR, which was released several weeks ago. ofxASR is the core engine of CCA, and it can be applied on any oF application. Currently it use CMU Sphinx3 as its Automatic Speech Recognition (ASR) engine, but it also designed to use other ASR engine as well, such as Mac OSX Speech as all engine share the same interface. You can get the source of ofxASR here. Also a class named ofRectPrint was created to print lines of string in a rectangle with auto scroll and scroll up/down.

Coming Soon
- Ship better sphinx resources that support any English words instead of digits.
- The free-speaking mode.
- Output to network.
- OSX and Linux support.

Join the Discussion  |   Getting Started  |   Download Now  |   Get the Source