Hard Light Productions Forums
Modding, Mission Design, and Coding => FS2 Open Coding - The Source Code Project (SCP) => Cross-Platform Development => Topic started by: if on October 08, 2009, 05:15:48 pm
-
A word of warning:
While it is usable, this is not a full-fledged application, just a bit more than a proof of concept. The setup requires some knowledge of compiling files, getting files from subversion repositories, adding Linux packages, and before anything else patience.
The whole setup is largely based on the Speech Recognition Software packages from CMU. After everything is setup you will have voice command recognition in Freespace 2 SCP. For example shouting "Stop" will stop the ship, "Target View" will target the object in the ship's center view.
I have attached an archive which contains step by step instructions ( fs2notes.txt ) to setup/compile/install and use the SR/VC in Freespace. The archive contains also some other (helper) files which can be used to simplify the setup process.
Good luck!
[attachment deleted by admin]
-
If portej wanders in here, he can back me up on this. I think I suggested this as a possible Speech Rec engine to replace Microsoft's proprietary version a few months ago. What a great first post! Sphinx looked like a real contender. Do you have any more technical information about what/how/etc?
-
I build this some months ago. I almost forgot about it. I rediscovered it a couple of days ago. I refreshed a little bit the readme notes (fs2notes.txt) yesterday.
Let's see...
Basically all this process is doing is to build an executable called fs2live, which runs in background.
It has a configuration file (fs2.cmd) which maps commands (keywords) to a sequence of keys. When a command is successfully recognized it will inject the sequence of keys to X Window System. There are a couple of options to combine a sequence of keys: "," ... keys are sent in succession, "+" ... keys are sent in the same time, "#delay" ... keys are pressed for a specified interval of time (delay) then released (for example Afterburner key).
As I mentioned previously it makes use of CMU speech recognition software, specifically Sphinx3. Its technical details can be found at: http://cmusphinx.sourcefourge.net/html/cmusphinx.php.
Simply put, to build a Voice Command system you'll need a Language Grammar (Vocabulary) and an Acoustic Model.
The Language Grammar for fs2live is built from scratch.
The Acoustic Model it is an adjustment of an pre-existing Acoustic Model which comes with sphinx3 packages, if I remember well, it is called AN4. The adjustment is done by recording the voice commands specified in the fs2.txt, passing them through a couple of sphinx3 processing tools, and creating something of a patch for the AN4 model.
As the main Acoustic Model, AN4, is based on US native speakers, they may get greater SR accuracy. There are a couple of free Acoustic Models over there which may be researched too. Building one from scratch from what I remember is not an easy task.
Let me know what else you want to know.
-
Well I don't have linux around so I started following your notes on OS X. Trying to test SoX, I failed on testing the rec command. That was with SoX in Ports, I'm going to try the one they released next and see if i have any better luck. It couldn't handle default encoding or something, not sure what that was about.
-
sox is not really required. I should probably update the notes. the format of the recorded voice files is:
sample rate 16000
1 channel (mono)
sample encoding: signed, 16 bit
name convention: fs2_0001.raw to fs2_0091.raw
Once you have them through any other means you can skip step 1. and the following substeps:
#for i in `seq 1 $count`; do fn=`printf fs2_%04d $i`; read sent; echo $sent; rec -c1 -r 16000 $fn.wav 2>/dev/null; sleep 1; 2>/dev/null; done < fs2.txt
(press Ctrl-C to skip to the next sentence)
#for i in fs2*.wav; do sox $i -r 16000 -c 1 -s `echo $i | awk -F. '{ print $1 }'`.raw; done
-
Correction:
The Acoustic Model used is called HUB4. More information about it can be found at:
http://www.speech.cs.cmu.edu/sphinx/models/
AN4 is a Language Model, used (together with HUB4) by the sample program/script sphinx3-simple.
-
Well, AN4 (aka Census DB ) is/contains both a Language Model and an Acoustic Model.
For the ones interested, here is a short and nice introduction for the Speech Recognition process, found on Internet:
http://project.uet.itgo.com/speech.htm
-
I've updated the zip archive attached to the first post with all this info, and some from previous posts. It is now called fs2live_`date when was built`.zip
---
Yet another update. The following line from fs2notes: #cat fs2.txt | awk 'BEGIN {i=1} { printf " "$0" (fs_%04d)\n", i++ }' > fs2.transcription
should be replaced with:
#cat fs2.txt | awk 'BEGIN {i=1} { printf " "$0" (fs2_%04d)\n", i++ }' > fs2.transcription
I also have another step, if you want to build an Acoustic model from scratch or check the accuracy of the model.
Train a new Acoustic model, and/or check the Speech Recognition accuracy.
#cd /work
#mkdir fs2train
#cd /work/fs2train
#../SpeechTrain/scripts_pl/setup_SphinxTrain.pl -task fs2
#cd /work/fs2
#cp *.wav ../fs2train/wav
#cp fs2.listoffiles ../fs2train/etc/fs2.fileids
#cp fs2.transcription ../fs2train/etc/
#cp fs2.dic ../fs2train/etc/
#cp fs2.filler ../fs2train/etc/
...
(or add the attached files from fs2live.zip archive to the fs2train directory. If you have a different work directory you'll have to modify the etc/cfg files.)
...
#cd /work/fs2train
#./scripts_pl/make_feats.pl -ctl etc/fs2.fileids
#./scripts_pl/RunAll.pl
The log file is fs2.html. Additional log files can be found in the logdir directory.
Look for errors ( shown in red ). A couple are normal. Tens or more indicate some problem.
Check the SR accuracy
#cd /work/fs2train
#cp /usr/local/bin/sphinx3_decode /work/fs2train/bin
#./scripts_pl/decode/slave.pl
The log information is in fs2.html, logdir/decode, and result/ directory.
More importantly check the result/fs2.align file. It indicates which sentences/words have not successfully been recognized.
You may want to rerecord those sentences.
-
Just FYI, I plan on taking another look at this soon. Don't think we've forgotten about it.
-
Great. I check the forum now and then.
I plan to work on some speech related project. It won't be hard for me to dive into this one, from time to time, if there is enough interest.