Abstract
A barge-in free spoken dialogue interface using sound field control and microphone array is proposed. In the conventional spoken dialogue system using an acoustic echo canceller, it is indispensable to estimate a room transfer function, especially when the transfer function is changed by various interferences. However, the estimation is difficult when the user and the system speak simultaneously. To resolve the problem, we propose a sound field control technique to prevent the response sound from being observed. Combined with a microphone array, the proposed method can achieve high elimination performance with no adaptive process. The efficacy of the proposed interface is ascertained in the experiments on the basis of sound elimination and speech recognition.
Highlights
For hands-free realization of smooth communication with a spoken dialogue system, it should be guaranteed that a user’s command utterance reaches the system clearly
In order to achieve robustness, we propose a new interface for a barge-in free spoken dialogue system that combines multichannel sound field control and a microphone array
It can be seen that increasing both the number of microphone elements and the number of loudspeakers improves the performance of the proposed method, and can make the control robust against the fluctuation of room transfer functions
Summary
For hands-free realization of smooth communication with a spoken dialogue system, it should be guaranteed that a user’s command utterance reaches the system clearly. A user might interrupt sound responses from the system and utter a command, or he might start speaking before the termination of the sound responses from the system In such a situation, the sound given from the system to the user is observed as an acoustic echo return at a microphone used for acquisition of the user’s speech input, and degrades the speech recognition performance in receiving the user’s input command. In the state of barge-in (this is called a “double-talk problem”), since user’s speech input is mixed in the observed signal, the speech acts as noise to the estimation and the estimation fails In this case, the adaptation process should be stopped by some type of double-talk detection technique [8, 9]. When the room transfer function changes in the barge-in state, the elimination performance degrades
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.