Abstract
This paper presents a method to improve recognition of three simultaneous speech signals by a humanoid robot equipped with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech signal are difficult, because the signal-to-noise ratio is quite low (around −3 dB) and noise is not stable due to interfering voices. To improve recognition of three simultaneous speech signals, two key ideas are introduced. One is two-layered audio–visual integration of both name (ID) and location, that is, speech and face recognition, and speech and face localization. The other is acoustical modeling of the humanoid head by scattering theory. Sound sources are separated in real-time by an active direction-pass filter (ADPF), which extracts sounds from a specified direction by using the interaural phase/intensity difference estimated by scattering theory. Since features of separated sounds vary according to the sound direction, multiple direction- and speaker-dependent acoustic models are used. The system integrates ASR results by using the sound direction and speaker information provided by face recognition as well as confidence measure of ASR results to select the best one. The resulting system shows an improvement of about 10% on average against recognition of three simultaneous speech signals, where three speakers were located around the humanoid on a 1 m radius half circle, one of them being in front of him (angle 0°) and the other two being at symmetrical positions (± θ) varying by 10° steps from 0° to 90°.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.