Kinect microphone array-based speech and speaker recognition for the exhibition control of humanoid robots

Ing-Jr Ding,Jia-Yi Shi

doi:10.1016/j.compeleceng.2015.12.010

Abstract

This study developed a Kinect microphone array-based method for the voice-based control of humanoid robot exhibitions through speech and speaker recognition. A support vector machine (SVM), a Gaussian mixture model (GMM), and dynamic time warping (DTW) were used for speaker verification, speaker identification, and speech recognition, respectively; they were effectively combined for realizing advanced voice-based control of humanoid robot exhibitions. Speech recognition capability was enhanced by using the Kinect microphone array and by combining the DTW-based recognition decisions associated with all the microphones through a fuzzy control scheme. A humanoid robot with the proposed voice-based control can be controlled through voice commands by authenticated users. The robot first verifies the authenticity of the personal operator, following which it identifies the operator and validates the command. Subsequently, it executes the command if both the user and command are valid. Experimental results demonstrated the effectiveness and accuracy of the proposed method.

Full Text