Difference between revisions of "Voice control (sphinx+festival)"

From robotica.unileon.es
Jump to: navigation, search
(Speech recognition)
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Mobile manipulation | < go back to main]]
 
[[Mobile manipulation | < go back to main]]
  
==Preliminary steps==
+
 
 +
----
 +
{| style="border: solid 0px white; width: 100%"
 +
!Related articles
 +
----
 +
!Other articles
 +
----
 +
|-
 +
| valign="top" width="50%" |
 +
 
 +
[[MYRAbot's arm control (bioloid+arduino)]]<br/>
 +
[[Objects recognition and position calculation (webcam)]]<br/>
 +
[[MYRAbot's arm model for simulation (urdf+gazebo)]]<br/>
 +
[[MYRAbot model for simulation (urdf+gazebo)]]<br/>
 +
[[Integration of MYRAbot in moveIt! (gazebo+moveIt!)]]<br/>
 +
[[MYRAbot's Teleoperation with xbox360 wireless controller (xbox360 controller+joy)]]
 +
 
 +
| valign="top" |
 +
 
 +
[[CeRVaNTeS' arm and base control (maxon+epos2)]]<br/>
 +
[[CeRVaNTeS' arm model for simulation (urdf+gazebo)]]<br/>
 +
[[CeRVaNTeS model for simulation (urdf+gazebo)]]<br/>
 +
[[Integration of CeRVaNTeS in moveIt! (gazebo+moveIt!)]]<br/>
 +
[[CeRVaNTeS' Teleoperation with xbox360 wireless controller (xbox360 controller+joy)]]
 +
 
 +
|}
 +
----
 +
 
 +
 
 +
=Preliminary steps=
  
 
We use [http://cmusphinx.sourceforge.net/ CMU Pocket Sphinx] for speech recognition developed on [http://www.cmu.edu/index.shtml Carnegie Mellon University], and [http://www.cstr.ed.ac.uk/projects/festival/ Festival Speech Synthesis System] for text speech developed on [http://www.ed.ac.uk/home University of Edinburgh].
 
We use [http://cmusphinx.sourceforge.net/ CMU Pocket Sphinx] for speech recognition developed on [http://www.cmu.edu/index.shtml Carnegie Mellon University], and [http://www.cstr.ed.ac.uk/projects/festival/ Festival Speech Synthesis System] for text speech developed on [http://www.ed.ac.uk/home University of Edinburgh].
Line 22: Line 51:
 
[http://www.cstr.ed.ac.uk/projects/festival/ Festival Speech Synthesis System] is intigrated in the [http://wiki.ros.org/ ROS] ''package'' [http://wiki.ros.org/sound_play?distro=electric sound_play].
 
[http://www.cstr.ed.ac.uk/projects/festival/ Festival Speech Synthesis System] is intigrated in the [http://wiki.ros.org/ ROS] ''package'' [http://wiki.ros.org/sound_play?distro=electric sound_play].
  
==Speech recognition==
+
=Speech recognition=
  
 
First, we will execute the next command in a terminal within our workspace in order to create a ''package'' named "voz_fer" that will content the voice control programs:
 
First, we will execute the next command in a terminal within our workspace in order to create a ''package'' named "voz_fer" that will content the voice control programs:
Line 51: Line 80:
 
[http://www.speech.cs.cmu.edu/tools/lmtool-new.html Sphinx knowledge base tool]
 
[http://www.speech.cs.cmu.edu/tools/lmtool-new.html Sphinx knowledge base tool]
  
 
+
We will download and rename (commandos_voz.*) the generated files within the folder "config" of the generated ''package''. We will create a file named "comados_voz.launch" within the folder "launch" of the ''package'' in order to start [http://cmusphinx.sourceforge.net/ CMU Pocket Sphinx] with our vocabulary:
 
 
<!--
 
 
 
Descargaremos los archivos generados al directorio "config" del ''package'' creado y los renombraremos como "comandos_voz.*". Vamos a crear un ''launcher'' para poder ejecutar el programa de reconocimiento usando el vocabulario que hemos creado. En el directorio "launch" el ''package'' crearemos un archivo llamado "comandos_voz.launch" con el siguiente contenido:
 
  
 
<syntaxhighlight lang="xml">
 
<syntaxhighlight lang="xml">
Line 68: Line 93:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Para iniciar el reconocimiento de voz simplemente debemos ejecutar en un terminal el siguiente comando:
+
We will execute the next command in a terminal in order to start the speech recognition:
  
 
<syntaxhighlight>roslaunch voz_fer comandos_voz.launch</syntaxhighlight>
 
<syntaxhighlight>roslaunch voz_fer comandos_voz.launch</syntaxhighlight>
  
Los resultados de reconocimiento de voz se publican en el ''topic'' "recognizer/output", en un mensaje tipo [http://docs.ros.org/api/std_msgs/html/msg/String.html std_msgs/String]. Podemos ver el resultado producido al hablar ejecutando el siguiente comando en otro terminal:
+
The results of the speech recognition are published in the ''topic'' "recognizer/output" (message type [http://docs.ros.org/api/std_msgs/html/msg/String.html std_msgs/String]). We will execute the next command in a terminal in order to see the results of the speech recognition:
  
 
<syntaxhighlight>rostopic echo recognizer/output</syntaxhighlight>
 
<syntaxhighlight>rostopic echo recognizer/output</syntaxhighlight>
  
==Locución de texto==
+
=Text speech=
  
Para poder realizar la locución de texto desde nuestro programa necesitamos hacer uso de la clase sound_play::SoundClient, en la cul se definen las siguientes funciones públicas:
+
We have to use the class sound_play::SoundClient in order to use text speech in our program. The public functions of this class are shown below:
  
 
<syntaxhighlight lang="cpp">
 
<syntaxhighlight lang="cpp">
Line 114: Line 139:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
La función que vamos a emplear es "say", que pronuncia una cadena de texto una sola vez, usando la voz por defecto "voice_kal_diphone". Para completar la reproducción de un archivo o cadena de texto es necesario hacer una pausa en la ejecución de nuestro código usando la función "sleep(segundos)".
+
We will use the function "say" that say once a string using the default synthesized voice "voice_kal_diphone". We have to use the function "sleep(seconds)" in order to pause the program execution to say the string.
  
==Programa de control MYRAbot==
+
=Control program MYRAbot=
  
Este programa permite teleoperar mediante la voz el MYRAbot, así como dar una serie de órdenes para el señalado de objetos presentes en la escena. El programa crea un ''nodo'' llamado "control_voz" el cual se suscribe al ''topic'' "recognizer/output", publicado por pocketsphinx, y publica los ''topics'' "cmd_vel", para el control de los movimientos de la base, y "selected_object", para la selección de opciones de apuntado del [[Control brazo MYRAbot (bioloid+arduino)|brazo]], al que está suscrito el programa [[Detección y cálculo de posición de objetos (cámara web)#Señalar objetos (Escoja)|escoja]]. En el directorio "src" de nuestro ''package'' crearemos un archivo llamado "control_voz.cpp", con el siguiente contenido:
+
This program allows us teleoperate MYRAbot and point objects with the [[MYRAbot's arm control (bioloid+arduino)|arm]] using our voice. The program is subscribe to the ''topic'' "recognizer/output" (publised by pocketsphinx) and publishes the ''topics'' "cmd_vel" (base control) and "selected_object" (point objects using the program [[Objects recognition and position calculation (webcam)#To point objects (Choose)]|choose]). We will create a file named "control_voz.cpp" within the folder "src" of the created ``package'' with the content that is shown below:
  
 
<syntaxhighlight lang="cpp" enclose="div">
 
<syntaxhighlight lang="cpp" enclose="div">
Line 380: Line 405:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Para poder compilar y generar el ejecutable del programa añadiremos la siguiente línea de código al final del archivo "CMakeLists.txt" de nuestro ''package'':
+
We have to add the next code line to the file "CMakeLists.txt" of the ''package'' in order to compile and generate the executable:
  
 
<syntaxhighlight>rosbuild_add_executable(control_voz src/control_voz.cpp)</syntaxhighlight>
 
<syntaxhighlight>rosbuild_add_executable(control_voz src/control_voz.cpp)</syntaxhighlight>
  
Para comenzar la compilación debemos ejecutar en un terminal la siguiente secuencia de comandos:
+
We will execute the next commands in a terminal in order to compile:
  
 
<syntaxhighlight>
 
<syntaxhighlight>
Line 391: Line 416:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Con el programa compilado y creado el ejecutable, vamos a crear un ''launcher'' para lanzar todos los programas necesarios para su ejecución. Crearemos en el directorio "launch" del ''package'' un archivo llamado "control_voz.launch" con el siguiente contenido:
+
We will create a file named "control_voz.launch" within the folder "launch" of the ''package'' with the content that is shown below in order to start all the programs:
  
 
<syntaxhighlight lang="xml">
 
<syntaxhighlight lang="xml">
Line 409: Line 434:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Para iniciar el ''launcher'' simplemente deberemos ejecutar el siguiente comando:
+
We will execute the next command in a terminal in order to start the voice control:
  
 
<syntaxhighlight>roslaunch voz_fer control_voz.launch</syntaxhighlight>
 
<syntaxhighlight>roslaunch voz_fer control_voz.launch</syntaxhighlight>
  
En el siguiente vídeo puede verse el [[Modelo para simulación MYRAbot (urdf+gazebo)|modelo del MYRAbot]] en [http://wiki.ros.org/simulator_gazebo?distro=electric gazebo], mientras es teleoperado usando los comandos de voz del vocabulario seleccionado:
+
We can see the [[MYRAbot model for simulation (urdf+gazebo)|MYRAbot model]] in [http://wiki.ros.org/simulator_gazebo?distro=electric gazebo] using the voice control in the next video:
  
 
<center><videoflash>_hvuvlRyNt0</videoflash></center>
 
<center><videoflash>_hvuvlRyNt0</videoflash></center>
  
<center>[http://www.youtube.com/watch?v=_hvuvlRyNt0 Ver vídeo en YouTube]</center>
+
<center>[http://www.youtube.com/watch?v=_hvuvlRyNt0 Watch in YouTube]</center>
 +
 
 +
 
 +
----
 +
{| style="border: solid 0px white; width: 100%"
 +
!Related articles
 +
----
 +
!Other articles
 +
----
 +
|-
 +
| valign="top" width="50%" |
 +
 
 +
[[MYRAbot's arm control (bioloid+arduino)]]<br/>
 +
[[Objects recognition and position calculation (webcam)]]<br/>
 +
[[MYRAbot's arm model for simulation (urdf+gazebo)]]<br/>
 +
[[MYRAbot model for simulation (urdf+gazebo)]]<br/>
 +
[[Integration of MYRAbot in moveIt! (gazebo+moveIt!)]]<br/>
 +
[[MYRAbot's Teleoperation with xbox360 wireless controller (xbox360 controller+joy)]]
 +
 
 +
| valign="top" |
 +
 
 +
[[CeRVaNTeS' arm and base control (maxon+epos2)]]<br/>
 +
[[CeRVaNTeS' arm model for simulation (urdf+gazebo)]]<br/>
 +
[[CeRVaNTeS model for simulation (urdf+gazebo)]]<br/>
 +
[[Integration of CeRVaNTeS in moveIt! (gazebo+moveIt!)]]<br/>
 +
[[CeRVaNTeS' Teleoperation with xbox360 wireless controller (xbox360 controller+joy)]]
  
-->
+
|}
 +
----
  
  
 
[[Mobile manipulation | < go back to main]]
 
[[Mobile manipulation | < go back to main]]

Latest revision as of 11:11, 29 September 2014

< go back to main



Related articles
Other articles

MYRAbot's arm control (bioloid+arduino)
Objects recognition and position calculation (webcam)
MYRAbot's arm model for simulation (urdf+gazebo)
MYRAbot model for simulation (urdf+gazebo)
Integration of MYRAbot in moveIt! (gazebo+moveIt!)
MYRAbot's Teleoperation with xbox360 wireless controller (xbox360 controller+joy)

CeRVaNTeS' arm and base control (maxon+epos2)
CeRVaNTeS' arm model for simulation (urdf+gazebo)
CeRVaNTeS model for simulation (urdf+gazebo)
Integration of CeRVaNTeS in moveIt! (gazebo+moveIt!)
CeRVaNTeS' Teleoperation with xbox360 wireless controller (xbox360 controller+joy)



Preliminary steps

We use CMU Pocket Sphinx for speech recognition developed on Carnegie Mellon University, and Festival Speech Synthesis System for text speech developed on University of Edinburgh.

Currently the ROS package pocketsphinx is available for groovy and hydro distributions. We can install it executing the next command in a terminal:

sudo apt-get install ros-"groovy or hydro"-pocketsphinx

We have to install the ubuntu package CMU Pocket Sphinx in previous ROS distributions. We will execute the next command in a terminal to install it:

sudo apt-get install gstreamer0.10-pocketsphinx

Moreover for previous ROS distributions, we have to download the ROS stack "rharmony" by University of Albany in order to integrate CMU Pocket Sphinx in ROS. We will place in the ROS workspace and we will execute the next commands in a terminal:

svn checkout http://albany-ros-pkg.googlecode.com/svn/trunk/rharmony
rosmake --rosdep-install pocketsphinx

Festival Speech Synthesis System is intigrated in the ROS package sound_play.

Speech recognition

First, we will execute the next command in a terminal within our workspace in order to create a package named "voz_fer" that will content the voice control programs:

roscreate-pkg voz_fer pocketsphinx sound_play std_msgs roscpp

We have to select the vocabulary that we use to control the robot. We use short sentences for the orders in order to reduce the recognition mistakes. We will create a file named "comandos_voz.txt" within the folder "config" of the created package with the content that is show below (each sentence must be in a new line):

start speech
stop speech
go forward
go back
turn left
turn right
speed up
slow down
rotate left
rotate right
stop
point beer
point cocacola
fold arm

We will upload and compile the created file "comandos_voz.txt" in the next link in order to generate the recognition vocabulary:

Sphinx knowledge base tool

We will download and rename (commandos_voz.*) the generated files within the folder "config" of the generated package. We will create a file named "comados_voz.launch" within the folder "launch" of the package in order to start CMU Pocket Sphinx with our vocabulary:

<launch>

  <node name="recognizer" pkg="pocketsphinx" type="recognizer.py" output="screen">
    <param name="lm" value="$(find voz_fer)/config/comandos.lm"/>
    <param name="dict" value="$(find voz_fer)/config/comandos.dic"/>
  </node>
  
</launch>

We will execute the next command in a terminal in order to start the speech recognition:

roslaunch voz_fer comandos_voz.launch

The results of the speech recognition are published in the topic "recognizer/output" (message type std_msgs/String). We will execute the next command in a terminal in order to see the results of the speech recognition:

rostopic echo recognizer/output

Text speech

We have to use the class sound_play::SoundClient in order to use text speech in our program. The public functions of this class are shown below:

Sound 	builtinSound (int id)
 	Create a builtin Sound.
void 	play (int sound)
 	Play a buildin sound.
void 	playWave (const std::string &s)
 	Plays a WAV or OGG file.
void 	repeat (const std::string &s)
 	Say a string repeatedly.
void 	say (const std::string &s, const std::string &voice="voice_kal_diphone")
 	Say a string.
void 	setQuiet (bool state)
 	Turns warning messages on or off.
 	SoundClient (ros::NodeHandle &nh, const std::string &topic)
 	Create a SoundClient that publishes on the given topic.
 	SoundClient ()
 	Create a SoundClient with the default topic.
void 	start (int sound)
 	Play a buildin sound repeatedly.
void 	startWave (const std::string &s)
 	Plays a WAV or OGG file repeatedly.
void 	stop (int sound)
 	Stop playing a built-in sound.
void 	stopAll ()
 	Stop all currently playing sounds.
void 	stopSaying (const std::string &s)
 	Stop saying a string.
void 	stopWave (const std::string &s)
 	Stop playing a WAV or OGG file.
Sound 	voiceSound (const std::string &s)
 	Create a voice Sound.
Sound 	waveSound (const std::string &s)

We will use the function "say" that say once a string using the default synthesized voice "voice_kal_diphone". We have to use the function "sleep(seconds)" in order to pause the program execution to say the string.

Control program MYRAbot

This program allows us teleoperate MYRAbot and point objects with the arm using our voice. The program is subscribe to the topic "recognizer/output" (publised by pocketsphinx) and publishes the topics "cmd_vel" (base control) and "selected_object" (point objects using the program [[Objects recognition and position calculation (webcam)#To point objects (Choose)]|choose]). We will create a file named "control_voz.cpp" within the folder "src" of the created ``package with the content that is shown below:

  #include "std_msgs/Int16.h"
  #include "std_msgs/String.h"
  #include "geometry_msgs/Twist.h"
  #include "sound_play/sound_play.h"
 
  int inicio = 0;
  int rotate = 0;
  int turn = 0;
  int start = 0;
  int stop = 1;
  float linear_vel = 0.0;
  float angular_vel = 0.0;
  std_msgs::String comando_voz;  
 
  void sleepok(int t, ros::NodeHandle &n)
  {
       if (n.ok())
               sleep(t);
  } 
 
  void speech(const std_msgs::String& voz)
  {
	::comando_voz = voz;
	
	ros::NodeHandle n;	
       
	ros::Publisher object_pub_=n.advertise<std_msgs::Int16>("selected_object", 1); 
        
	ros::Publisher move_base_pub_=n.advertise<geometry_msgs::Twist>("cmd_vel", 1); 
	

	
	std_msgs::Int16 objeto;
	geometry_msgs::Twist mover_base;
	      
	if (::comando_voz.data == "start speech")
	{
		inicio = 1;
				
	}
	else if (::comando_voz.data == "stop speech")
	{
		inicio = 0;
				
	}
	else if (inicio == 1)
	{
		if (::comando_voz.data == "stop")
		{
			::linear_vel = 0.0;
			::angular_vel = 0.0;
			
		}
		else if (::comando_voz.data == "go forward")
		{
			
			::linear_vel = 0.1;
			::angular_vel = 0.0;
			
		}
		else if (::comando_voz.data == "go back")
		{
			
			::linear_vel = -0.1;
			::angular_vel = 0.0;
			
		}	
		else if (::comando_voz.data == "turn left")
		{
			if (::angular_vel <= 0 || ::rotate == 1)
			{
				::angular_vel = 0.1;
				
				::rotate = 0;
				::turn = 1;					
			}
			else
			{
			
				::angular_vel = ::angular_vel + 0.1;

			}
		}
		else if (::comando_voz.data == "turn right")
		{
			if (::angular_vel >= 0 || ::rotate == 1)
			{
				::angular_vel = -0.1;
				
				::rotate = 0;
				::turn = 1;				
			}
			else
			{
		
			         ::angular_vel = ::angular_vel - 0.1;

			}
		}
		else if (::comando_voz.data == "rotate left")
		{
			if (::angular_vel <= 0 || ::turn == 1)
			{
				::angular_vel = 0.1;
				
				::turn = 0;
				::rotate = 1;
			}
			
			::linear_vel = 0.0;
			
		}
		else if (::comando_voz.data == "rotate right")
		{
			if (::angular_vel >= 0 || ::turn == 1)
			{
				::angular_vel = -0.1;
				
				::turn = 0;
				::rotate = 1;				
			}
			
			::linear_vel = 0.0;

		}		
		else if (::comando_voz.data == "speed up")				
		{
			if (::linear_vel == 0)
			{
				if (::angular_vel > 0 && ::rotate == 1)
				{
					::angular_vel= ::angular_vel + 0.05;
				}
				else if (::rotate == 1)
				{
					::angular_vel = ::angular_vel - 0.05;
				}			
			}
			else
			{
				if (::linear_vel > 0)
				{
					::linear_vel = ::linear_vel + 0.05;
				}
				else
				{
					::linear_vel = ::linear_vel - 0.05;
				}
			}
		}
		else if (::comando_voz.data == "slow down")				
		{
			if (::linear_vel == 0)
			{
				if (::angular_vel > 0 && ::rotate == 1)
				{
					::angular_vel = ::angular_vel - 0.05;
				}
				else if (::rotate == 1)
				{
					::angular_vel = ::angular_vel + 0.05;
				}					
			}
			else
			{
				if (::linear_vel > 0)
				{
					::linear_vel = ::linear_vel - 0.05;
				}
				else
				{
					::linear_vel = ::linear_vel + 0.05;
				}			
			}
		}
		else if (::comando_voz.data == "point beer")
		{
			objeto.data = 1;
			
			object_pub_.publish(objeto);
		}
		else if (::comando_voz.data == "point cocacola")
		{
			objeto.data = 2;
			
			object_pub_.publish(objeto);
		}
		else if (::comando_voz.data == "fold arm")
		{
			objeto.data = 100;
			
			object_pub_.publish(objeto);
		}													
	}
  }
      
  int main(int argc, char **argv)
  {
         
        ros::init(argc, argv, "control_voz");  
 
        ros::NodeHandle n;
 
        ros::Subscriber speech_sub_= n.subscribe("/recognizer/output", 1, speech);
       
        ros::Publisher object_pub_=n.advertise<std_msgs::Int16>("selected_object", 1); 
        
        ros::Publisher move_base_pub_=n.advertise<geometry_msgs::Twist>("cmd_vel", 1);
	      
        
        ros::Rate loop_rate(10);
        
        geometry_msgs::Twist mover_base;
        
        sound_play::SoundClient voz_robot;
        
		sleepok(1, n);        
        
        while (ros::ok())
        {		
			mover_base.linear.x = ::linear_vel;
			mover_base.angular.z = ::angular_vel; 
			
			move_base_pub_.publish(mover_base);
			
			if (::comando_voz.data == "start speech" && ::start == 0)
			{
				voz_robot.say("bep bep. I am listening you");
				
				sleepok(4, n);
				
				::start = 1;
				::stop = 0;
			}
			else if (::comando_voz.data == "stop speech" && ::stop == 0)
			{
				voz_robot.say("bep bep. I stop listening");
				
				sleepok(4, n);
				
				::stop = 1;
				::start = 0;
			}			
			
			ros::spinOnce();
	
			loop_rate.sleep();			
        
		}                     
       
        ros::spin();  
 
        return 0;
  }

We have to add the next code line to the file "CMakeLists.txt" of the package in order to compile and generate the executable:

rosbuild_add_executable(control_voz src/control_voz.cpp)

We will execute the next commands in a terminal in order to compile:

roscd voz_fer
make

We will create a file named "control_voz.launch" within the folder "launch" of the package with the content that is shown below in order to start all the programs:

<launch>

  <include file="$(find voz_fer)/launch/comandos_voz.launch"/>

  <node name="soundplay_node" pkg="sound_play" type="soundplay_node.py"/>  

  <node name="control_voz" pkg="voz_fer" type="control_voz" />

  <include file="$(find brazo_fer)/launch/escoja.launch"/>
  
</launch>

We will execute the next command in a terminal in order to start the voice control:

roslaunch voz_fer control_voz.launch

We can see the MYRAbot model in gazebo using the voice control in the next video:

<videoflash>_hvuvlRyNt0</videoflash>
Watch in YouTube



Related articles
Other articles

MYRAbot's arm control (bioloid+arduino)
Objects recognition and position calculation (webcam)
MYRAbot's arm model for simulation (urdf+gazebo)
MYRAbot model for simulation (urdf+gazebo)
Integration of MYRAbot in moveIt! (gazebo+moveIt!)
MYRAbot's Teleoperation with xbox360 wireless controller (xbox360 controller+joy)

CeRVaNTeS' arm and base control (maxon+epos2)
CeRVaNTeS' arm model for simulation (urdf+gazebo)
CeRVaNTeS model for simulation (urdf+gazebo)
Integration of CeRVaNTeS in moveIt! (gazebo+moveIt!)
CeRVaNTeS' Teleoperation with xbox360 wireless controller (xbox360 controller+joy)



< go back to main