Difference between revisions of "Voice control (sphinx+festival)"
(→Text speech) |
(→Control program MYRAbot) |
||
Line 114: | Line 114: | ||
==Control program MYRAbot== | ==Control program MYRAbot== | ||
− | + | This program allows us teleoperate MYRAbot and point objects with the [[MYRAbot's arm control (bioloid+arduino)|arm]] using our voice. The program is subscribe to the ''topic'' "recognizer/output" (publised by pocketsphinx) and publishes the ''topics'' "cmd_vel" (base control) and "selected_object" (point objects using the program [[Objects recognition and position calculation (webcam)#To point objects (Choose)]|choose]). We will create a file named "control_voz.cpp" within the folder "src" of the created ``package'' with the content that is shown below: | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<syntaxhighlight lang="cpp" enclose="div"> | <syntaxhighlight lang="cpp" enclose="div"> | ||
Line 381: | Line 375: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
+ | |||
+ | |||
+ | |||
+ | <!-- | ||
Para poder compilar y generar el ejecutable del programa añadiremos la siguiente línea de código al final del archivo "CMakeLists.txt" de nuestro ''package'': | Para poder compilar y generar el ejecutable del programa añadiremos la siguiente línea de código al final del archivo "CMakeLists.txt" de nuestro ''package'': |
Revision as of 22:03, 24 April 2014
Preliminary steps
We use CMU Pocket Sphinx for speech recognition developed on Carnegie Mellon University, and Festival Speech Synthesis System for text speech developed on University of Edinburgh.
Currently the ROS package pocketsphinx is available for groovy and hydro distributions. We can install it executing the next command in a terminal:
sudo apt-get install ros-"groovy or hydro"-pocketsphinx
We have to install the ubuntu package CMU Pocket Sphinx in previous ROS distributions. We will execute the next command in a terminal to install it:
sudo apt-get install gstreamer0.10-pocketsphinx
Moreover for previous ROS distributions, we have to download the ROS stack "rharmony" by University of Albany in order to integrate CMU Pocket Sphinx in ROS. We will place in the ROS workspace and we will execute the next commands in a terminal:
svn checkout http://albany-ros-pkg.googlecode.com/svn/trunk/rharmony
rosmake --rosdep-install pocketsphinx
Festival Speech Synthesis System is intigrated in the ROS package sound_play.
Speech recognition
First, we will execute the next command in a terminal within our workspace in order to create a package named "voz_fer" that will content the voice control programs:
roscreate-pkg voz_fer pocketsphinx sound_play std_msgs roscpp
We have to select the vocabulary that we use to control the robot. We use short sentences for the orders in order to reduce the recognition mistakes. We will create a file named "comandos_voz.txt" within the folder "config" of the created package with the content that is show below (each sentence must be in a new line):
start speech
stop speech
go forward
go back
turn left
turn right
speed up
slow down
rotate left
rotate right
stop
point beer
point cocacola
fold arm
We will upload and compile the created file "comandos_voz.txt" in the next link in order to generate the recognition vocabulary:
We will download and rename (commandos_voz.*) the generated files within the folder "config" of the generated package. We will create a file named "comados_voz.launch" within the folder "launch" of the package in order to start CMU Pocket Sphinx with our vocabulary:
<launch>
<node name="recognizer" pkg="pocketsphinx" type="recognizer.py" output="screen">
<param name="lm" value="$(find voz_fer)/config/comandos.lm"/>
<param name="dict" value="$(find voz_fer)/config/comandos.dic"/>
</node>
</launch>
We will execute the next command in a terminal in order to start the speech recognition:
roslaunch voz_fer comandos_voz.launch
The results of the speech recognition are published in the topic "recognizer/output" (message type std_msgs/String). We will execute the next command in a terminal in order to see the results of the speech recognition:
rostopic echo recognizer/output
Text speech
We have to use the class sound_play::SoundClient in order to use text speech in our program. The public functions of this class are shown below:
Sound builtinSound (int id)
Create a builtin Sound.
void play (int sound)
Play a buildin sound.
void playWave (const std::string &s)
Plays a WAV or OGG file.
void repeat (const std::string &s)
Say a string repeatedly.
void say (const std::string &s, const std::string &voice="voice_kal_diphone")
Say a string.
void setQuiet (bool state)
Turns warning messages on or off.
SoundClient (ros::NodeHandle &nh, const std::string &topic)
Create a SoundClient that publishes on the given topic.
SoundClient ()
Create a SoundClient with the default topic.
void start (int sound)
Play a buildin sound repeatedly.
void startWave (const std::string &s)
Plays a WAV or OGG file repeatedly.
void stop (int sound)
Stop playing a built-in sound.
void stopAll ()
Stop all currently playing sounds.
void stopSaying (const std::string &s)
Stop saying a string.
void stopWave (const std::string &s)
Stop playing a WAV or OGG file.
Sound voiceSound (const std::string &s)
Create a voice Sound.
Sound waveSound (const std::string &s)
We will use the function "say" that say once a string using the default synthesized voice "voice_kal_diphone". We have to use the function "sleep(seconds)" in order to pause the program execution to say the string.
Control program MYRAbot
This program allows us teleoperate MYRAbot and point objects with the arm using our voice. The program is subscribe to the topic "recognizer/output" (publised by pocketsphinx) and publishes the topics "cmd_vel" (base control) and "selected_object" (point objects using the program [[Objects recognition and position calculation (webcam)#To point objects (Choose)]|choose]). We will create a file named "control_voz.cpp" within the folder "src" of the created ``package with the content that is shown below:
#include "std_msgs/Int16.h"
#include "std_msgs/String.h"
#include "geometry_msgs/Twist.h"
#include "sound_play/sound_play.h"
int inicio = 0;
int rotate = 0;
int turn = 0;
int start = 0;
int stop = 1;
float linear_vel = 0.0;
float angular_vel = 0.0;
std_msgs::String comando_voz;
void sleepok(int t, ros::NodeHandle &n)
{
if (n.ok())
sleep(t);
}
void speech(const std_msgs::String& voz)
{
::comando_voz = voz;
ros::NodeHandle n;
ros::Publisher object_pub_=n.advertise<std_msgs::Int16>("selected_object", 1);
ros::Publisher move_base_pub_=n.advertise<geometry_msgs::Twist>("cmd_vel", 1);
std_msgs::Int16 objeto;
geometry_msgs::Twist mover_base;
if (::comando_voz.data == "start speech")
{
inicio = 1;
}
else if (::comando_voz.data == "stop speech")
{
inicio = 0;
}
else if (inicio == 1)
{
if (::comando_voz.data == "stop")
{
::linear_vel = 0.0;
::angular_vel = 0.0;
}
else if (::comando_voz.data == "go forward")
{
::linear_vel = 0.1;
::angular_vel = 0.0;
}
else if (::comando_voz.data == "go back")
{
::linear_vel = -0.1;
::angular_vel = 0.0;
}
else if (::comando_voz.data == "turn left")
{
if (::angular_vel <= 0 || ::rotate == 1)
{
::angular_vel = 0.1;
::rotate = 0;
::turn = 1;
}
else
{
::angular_vel = ::angular_vel + 0.1;
}
}
else if (::comando_voz.data == "turn right")
{
if (::angular_vel >= 0 || ::rotate == 1)
{
::angular_vel = -0.1;
::rotate = 0;
::turn = 1;
}
else
{
::angular_vel = ::angular_vel - 0.1;
}
}
else if (::comando_voz.data == "rotate left")
{
if (::angular_vel <= 0 || ::turn == 1)
{
::angular_vel = 0.1;
::turn = 0;
::rotate = 1;
}
::linear_vel = 0.0;
}
else if (::comando_voz.data == "rotate right")
{
if (::angular_vel >= 0 || ::turn == 1)
{
::angular_vel = -0.1;
::turn = 0;
::rotate = 1;
}
::linear_vel = 0.0;
}
else if (::comando_voz.data == "speed up")
{
if (::linear_vel == 0)
{
if (::angular_vel > 0 && ::rotate == 1)
{
::angular_vel= ::angular_vel + 0.05;
}
else if (::rotate == 1)
{
::angular_vel = ::angular_vel - 0.05;
}
}
else
{
if (::linear_vel > 0)
{
::linear_vel = ::linear_vel + 0.05;
}
else
{
::linear_vel = ::linear_vel - 0.05;
}
}
}
else if (::comando_voz.data == "slow down")
{
if (::linear_vel == 0)
{
if (::angular_vel > 0 && ::rotate == 1)
{
::angular_vel = ::angular_vel - 0.05;
}
else if (::rotate == 1)
{
::angular_vel = ::angular_vel + 0.05;
}
}
else
{
if (::linear_vel > 0)
{
::linear_vel = ::linear_vel - 0.05;
}
else
{
::linear_vel = ::linear_vel + 0.05;
}
}
}
else if (::comando_voz.data == "point beer")
{
objeto.data = 1;
object_pub_.publish(objeto);
}
else if (::comando_voz.data == "point cocacola")
{
objeto.data = 2;
object_pub_.publish(objeto);
}
else if (::comando_voz.data == "fold arm")
{
objeto.data = 100;
object_pub_.publish(objeto);
}
}
}
int main(int argc, char **argv)
{
ros::init(argc, argv, "control_voz");
ros::NodeHandle n;
ros::Subscriber speech_sub_= n.subscribe("/recognizer/output", 1, speech);
ros::Publisher object_pub_=n.advertise<std_msgs::Int16>("selected_object", 1);
ros::Publisher move_base_pub_=n.advertise<geometry_msgs::Twist>("cmd_vel", 1);
ros::Rate loop_rate(10);
geometry_msgs::Twist mover_base;
sound_play::SoundClient voz_robot;
sleepok(1, n);
while (ros::ok())
{
mover_base.linear.x = ::linear_vel;
mover_base.angular.z = ::angular_vel;
move_base_pub_.publish(mover_base);
if (::comando_voz.data == "start speech" && ::start == 0)
{
voz_robot.say("bep bep. I am listening you");
sleepok(4, n);
::start = 1;
::stop = 0;
}
else if (::comando_voz.data == "stop speech" && ::stop == 0)
{
voz_robot.say("bep bep. I stop listening");
sleepok(4, n);
::stop = 1;
::start = 0;
}
ros::spinOnce();
loop_rate.sleep();
}
ros::spin();
return 0;
}