Using pocketsphinx Part 3: Creating a Corpus

To use pocketsphinx, you will need to create a corpus. A corpus is simply a list of words and phrases you want pocketsphinx_continuous to recognize. For example, a corpus for a simple robot might look like this:

move forward
move back
move left
move right
say hello
say goodbye
power off

If you are using the web interface to generate your language model and dictionary, then you can use your corpus unmodified.

However, if you use the language model toolkit, you will need to enclose each word or phrase inside “<s>” tags like this:

<s> say hello </s>

The tools in the language model toolkit expect this format.

Using pocketsphinx Part 2: Using the CMU-Cambridge Statistical Language Modeling Toolkit

If you don’t want to be dependent on a web page to generate your language model and pronunciation dictionary, you can use the CMU-Cambridge Statistical Language Modeling Toolkit.

First, you will need to download a dictionary. I used the “cmu07a.dic” that comes with pocketsphinx. You will find it in

pocketsphinx-0.7/model/lm/en_US/cmu07a.dic

Next, you will need to download the source for the toolkit and build and install it on your Raspberry Pi:

# wget -c http://hivelocity.dl.sourceforge.net/project/cmusphinx/cmuclmtk/0.7/cmuclmtk-0.7.tar.gz
# tar jxvf cmuclmtk-0.7.tar.gz
# pushd cmuclmtk-0.7
# ./configure
# make
# sudo make install

Once you have installed the toolkit, you can use a script like this to create the files you will need for use with pocketsphinx_continuous:

#!/bin/bash
NAME="robot"
VOCAB_TYPE=1
CORPUS="../corpus.txt.sos"
text2wfreq < "$CORPUS" | wfreq2vocab > "$NAME.vocab"
text2idngram -vocab "$NAME.vocab" -idngram "$NAME.idngram" < "$CORPUS"
idngram2lm -vocab_type $VOCAB_TYPE -context "$NAME.css" -idngram "$NAME.idngram" -vocab "$NAME.vocab" -arpa "$NAME.arpa"

Name the script “mklm”. You can use the “.vocab” file to create your pronunciation dictionary using a script like this:

#!/bin/bash

NAME="robot"
DICT="../dict/cmu07a-plus.dic"

echo -n "" > "$NAME.dic"

while read line ;
do
	if [[ ! $line =~ ^# ]] ; then
		printf "Searching for %-20s" "$line..."
		if egrep "(^$line[[:space:]]|^$line\([0-9]\)[[:space:]])" "$DICT" >> "$NAME.dic" ; then
			echo " [FOUND]"
		else
			echo ""
		fi
	fi
done < "$NAME.vocab"

Name this script "mkdict".

When you run the "mkdict" script, it will show you all the words that can be found in the pronunciation dictionary. You may find that some of the words you need are not in the dictionary. Fortunately, it is relatively easy to add words to the dictionary. I created the word "reboot" by looking at other words that started with "re" and adding the pronunciation for "boot". The result looks like this:

reboot    R IY B UW T

IMPORTANT: You MUST put a TAB between the word and the pronunciation. If you use a space, pocketsphinx_continuous will not be able to use the word.

Once you have a dictionary that contains all of the words you need, you can run your "mklm" and "mkdict" scripts to generate your language model and dictionary files.

Obviously, you can use the "Sphinx Knowledge Base Tool" mentioned in the previous article to generate the files you need. However, I can think of a lot of scenarios in which a user might not want to submit a corpus to a publicly available web page.

Using pocketsphinx Part 1: Using the Sphinx Knowledge Base Tool

One way to use pocketsphinx is to upload a corpus (a list of words) to the Sphinx Knowledge Base Tool.

The tool will generate a pronunciation dictionary file and a language model file. You will need both of these files to run pocketsphinx_continuous.

Here is the script I use to run pocketsphinx_continuous using a language model and dictionary generated by the Sphinx Knowledge Base Tool:

#!/bin/bash

LM="$(basename $(pwd)).lm"
DICT="$(basename $(pwd)).dic"

if [[ ! -f "$LM" ]] ; then
	echo "ERROR: Could not find $LM"
	exit 1
fi

if [[ ! -f "$DICT" ]] ; then
	echo "ERROR: Could not find $DICT"
	exit 1
fi

pocketsphinx_continuous -adcdev sysdefault -lm "$LM" -dict "$DICT" 2> /dev/null | egrep '(^[0-9]+: |^READY)'

This version of the script derives the file names from the name of the working directory. For example, if the files are in a directory named “8650″, the script will use “8650.lm” for the language model and “8650.dic” for the dictionary.

This is a slightly different version of the script that allows debug output to be viewed:

#!/bin/bash

BASEDIR="$(basename $(pwd))"

LM="$BASEDIR.lm"
DICT="$BASEDIR.dic"

if [[ ! -f "$LM" ]] ; then
	echo "ERROR: Could not find $LM"
	exit 1
fi

if [[ ! -f "$DICT" ]] ; then
	echo "ERROR: Could not find $DICT"
	exit 1
fi

if [[ -n "$1" ]] && \
	[[ "$1" == "-v" ]] || [[ "$1" == "--verbose" ]] || \
	[[ "$1" == "-d" ]] || [[ "$1" == "--debug" ]] ; then
	pocketsphinx_continuous -adcdev sysdefault -lm "$LM" -dict "$DICT"
else
	pocketsphinx_continuous -adcdev sysdefault -lm "$LM" -dict "$DICT" 2> /dev/null | egrep '(^[0-9]+: |^READY)'
fi

Packages installed on my Raspberry Pi

This is a list of all the packages I added to my Raspberry Pi.

apt-file
bison
cpp-4.4-doc
cpp-4.7
doc-base
eject
esound-common
festival
festival-dev
festival-doc
festlex-cmu
festlex-poslex
festvox-kallpc16k
g++-4.7
gcc-4.4-doc
gcc-4.7
gcc-doc-base
giblib1:armhf
gstreamer0.10-pulseaudio:armhf
hdparm
libapt-pkg-perl
libasound2-dev:armhf
libasound2-plugins:armhf
libaudiofile-dev:armhf
libaudiofile1:armhf
libavcodec53:armhf
libavutil51:armhf
libbison-dev:armhf
libcap2:armhf
libconfig-file-perl
libdirac-encoder0:armhf
libesd0:armhf
libesd0-dev:armhf
libestools2.1:armhf
libestools2.1-dev
libexpat1-dev
libfftw3-3:armhf
libgpm2:armhf
libgsm1:armhf
libjack-jackd2-0:armhf
liblist-moreutils-perl
libmp3lame0:armhf
libncurses5-dev
libopenjpeg2:armhf
libreadline5:armhf
libregexp-assemble-perl
libschroedinger-1.0-0:armhf
libspeex-dev:armhf
libspeex1:armhf
libspeexdsp-dev:armhf
libspeexdsp1:armhf
libssl-dev
libssl-doc
libstdc++6-4.7-dev
libstdc++6-4.7-doc
libsystemd-daemon0:armhf
libtheora0:armhf
libtinfo-dev:armhf
libupower-glib1
libuuid-perl
libva1:armhf
libvpx1:armhf
libwebrtc-audio-processing-0:armhf
libx264-123:armhf
libxvidcore4:armhf
libyaml-tiny-perl
lsb-release
m4
manpages-posix
manpages-posix-dev
pm-utils
powermgmt-base
pulseaudio
pulseaudio-esound-compat
pulseaudio-module-x11
pulseaudio-utils
python-apt
python-apt-common
python-dbus-dev
python2.6
python2.6-minimal
python2.7-dev
rtkit
screen
scrot
speex
speex-doc
sshfs
upower
vim
vim-runtime
x11-xserver-utils

Some of these packages are required for building sphinxbase and pocketsphinx, but I don’t know which ones, so I have included the whole list for anyone trying to follow my instructions in the previous post.

Building Pocketsphinx for the Raspberry Pi

I created a nice cross-chroot system to build software for the Raspberry Pi on a Intel Core i7 build machine. However, the source for pocketsphinx is brain-damaged and it has to be built on the target. It is a relatively quick build, even on the Raspberry Pi, so I did not bother trying to troubleshoot the problem with the source.

I used the 0.7 version of pocketsphinx and sphinxbase. The latest version is 0.8.

After downloading the tarballs, extract them to the same directory so that you have a tree like this:

    work/
         sphinx/
                pocketsphinx-0.7/
                sphinxbase-0.7/

The reason for this is that when you build pocketsphinx, it will look for the sphinxbase source in “../sphinxbase-0.7″.

Build sphinxbase first and install it using

# ./configure
# make
# sudo make install

Build pocketsphinx using the same set of commands:

# ./configure
# make
# sudo make install

If the configure command complains about missing libraries, you will have to install the package that provides the development version of the library (which includes the header files). Here is a link to a list of packages installed on my Raspberry Pi.

I recommend adding this patch to pocketsphinx so that you can say “goodbye” without causing
pocketsphinx_continuous to exit:

--- pocketsphinx-0.7.old/src/programs/continuous.c	2011-04-14 12:27:25.000000000 -0400
+++ pocketsphinx-0.7.new/src/programs/continuous.c	2012-12-14 20:18:30.709392175 -0500
@@ -319,12 +319,14 @@ recognize_from_microphone()
         printf("%s: %s\n", uttid, hyp);
         fflush(stdout);
 
+#ifdef EXIT_IF_GOODBYE
         /* Exit if the first word spoken was GOODBYE */
         if (hyp) {
             sscanf(hyp, "%s", word);
             if (strcmp(word, "goodbye") == 0)
                 break;
         }
+#endif // EXIT_IF_GOODBYE
 
         /* Resume A/D recording for next utterance */
         if (ad_start_rec(ad) < 0)

You'll want to be able to say "goodbye" when you are finished creating your version of GLaDOS.

Embedding h.264 Video in a Web Page

In my previous post, I wanted to embed a video without using YouTube or Flash. After a quick search, I found this article:

http://www.w3schools.com/html/html_videos.asp

I used the following code to embed the video:

    <video controls="controls">
        <source src="http://kerneldriver.org/blog/wp-content/uploads/2012/10/remote-control-test-circuit.mp4" type="video/mp4" />
        <source src="http://kerneldriver.org/blog/wp-content/uploads/2012/10/remote-control-test-circuit.ogg" type="video/ogg" />
Your browser does not support the video tag.
    </video>

Be careful if you are doing this in a WYSIWIG editor like the one that comes with WordPress. You will have to type the above HTML with the “HTML” or “Text” tab selected.

I had to convert the MP4 file to OGG format to work with Firefox. I used the following script to do the conversion:

#!/bin/bash
if [[ -z "$*" ]] ; then
        echo "Usage: $(basename $0) <file>"
        echo ""
        echo "       file        mp4 file to convert to ogv"
        echo ""
        exit 22
fi

if [[ ! -f "$*" ]] ; then
        echo "The file \"$*\" does not exist"
        echo ""
        exit 22
fi

NAME="$*"
OUTPUT="${NAME%.*}.ogg"

avconv -i "$NAME" -acodec libvorbis -vcodec libtheora -f ogg -b:v 1024k "$OUTPUT"

Notice that I force the video bitrate to 1 Mbit/s. This is because the default chosen by “avconv” results in a video with a lot of artifacts.

To make things confusing, you have to install the “ffmpeg” package on Ubuntu to get “avconv”. However, if you run “ffmpeg”, you get the following message:

[user@machine:~] ffmpeg
ffmpeg version 0.8.5-4:0.8.5-0ubuntu0.12.04.1, Copyright (c) 2000-2012 the Libav developers
  built on Jan 24 2013 18:01:36 with gcc 4.6.3
*** THIS PROGRAM IS DEPRECATED ***
This program is only provided for compatibility and will be removed in a future release. Please use avconv instead.
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...

Use -h to get full help or, even better, run 'man ffmpeg'

Voice Controlled WiFi Robot Part 4

Test Circuit

This is a test circuit I made to verify that I could use 3.3V to control the motors.

I created a test circuit with a 5V regulator, a 3.3V regulator, and an N-channel MOSFET switch. I passed the output of the 3.3V regulator to a switch that fed the gate of the MOSFET.

The original plan was to use the 5V regulator output to provide power to the 3.3V regulator and rear motor. However, after some experimentation, I decided that the 3.3V regulator could run directly off the 6V battery. I also used the 6V battery as the supply for the rear motor.

I will eventually need the 5V regulator to provide power to the on-board Raspberry Pi, so testing that circuit was not a waste of time.

The following video shows me testing the circuit by pressing the green button to simulate a 3.3V signal from the Raspberry Pi going into the gate of the N-channel MOSFET.


Voice Controlled WiFi Robot Part 3

Robot Chassis

This is the chassis of the remote controlled car.

This is the chassis of the remote controlled car after I removed the body and the circuit board. The white connectors go to the forward steering motor and the rear drive motor.

I tested the motors using the 6V battery that came with the car. I was able to get the steering motor to turn the front wheels left and right. I also ran the rear wheels forward and backward. I wanted to make sure that the motors did not require any special drive signal.

Voice Controlled WiFi Robot Part 2

Original Circuit Board

This is the original circuit board that came with the remote control car.

This is what the remote controlled car looked like after I removed the body. The receiver circuit board is surprisingly modular. After a brief examination, I could see that the same board could be used for both the 27 MHz and 49 MHz bands. One really nice feature is that connectors are used for the connections to the front steering motor and the rear drive motor. This makes it easy to connect the motors to another circuit.

Voice Controlled WiFi Robot Part 1

Remote Control Car

This is the remote control car that I am using to build my robot.

I am building a voice controlled WiFi robot. I will use one Raspberry Pi to convert speech to text and send commands over WiFi to another Raspberry Pi that will control the robot.

For example, if I say, “Forward”, the local Raspberry Pi will convert that into a command and send it over WiFi to the remove Raspberry Pi. The remote Raspberry Pi will then switch on the power to the drive wheels to move the robot forward.