Zero-crossing-based speech segregation and recognition for humanoid robots

Sung Jun An,Young-Ik Kim,Rhee Man Kil

doi:10.1109/tce.2009.5373808

Abstract

Nowadays, humanoid robots attract people since their overall appearance is similar to the human body, allowing interaction with humans and the surrounding environment. In the case of the auditory interaction with humans, it is desirable that humanoid robots have similar capacity to the human?s auditory information processing system. This is a very difficult task, since current automatic speech recognition (ASR) systems are not quite robust to noise and it?s hard to attend to the selected speech source. In this context, this paper presents a new method of zero-crossing based binaural mask estimation for speech segregation and recognition, when multiple sound sources are present simultaneously. The proposed method provides high performance of speech segregation and recognition while offers significantly less computational complexity compared to the conventional methods based on cross-correlation. We expect that this method would be able to provide an effective tool for the auditory interaction with humanoid robots using the sensory information of binaural sounds.

Full Text