Abstract

A barge-in free spoken dialogue interface using sound field control and microphone array is proposed. In the conventional spoken dialogue system using an acoustic echo canceller, it is indispensable to estimate a room transfer function, especially when the transfer function is changed by various interferences. However, the estimation is difficult when the user and the system speak simultaneously. To resolve the problem, we propose a sound field control technique to prevent the response sound from being observed. Combined with a microphone array, the proposed method can achieve high elimination performance with no adaptive process. The efficacy of the proposed interface is ascertained in the experiments on the basis of sound elimination and speech recognition.

Highlights

  • For hands-free realization of smooth communication with a spoken dialogue system, it should be guaranteed that a user’s command utterance reaches the system clearly

  • In order to achieve robustness, we propose a new interface for a barge-in free spoken dialogue system that combines multichannel sound field control and a microphone array

  • It can be seen that increasing both the number of microphone elements and the number of loudspeakers improves the performance of the proposed method, and can make the control robust against the fluctuation of room transfer functions

Read more

Summary

Introduction

For hands-free realization of smooth communication with a spoken dialogue system, it should be guaranteed that a user’s command utterance reaches the system clearly. A user might interrupt sound responses from the system and utter a command, or he might start speaking before the termination of the sound responses from the system In such a situation, the sound given from the system to the user is observed as an acoustic echo return at a microphone used for acquisition of the user’s speech input, and degrades the speech recognition performance in receiving the user’s input command. In the state of barge-in (this is called a “double-talk problem”), since user’s speech input is mixed in the observed signal, the speech acts as noise to the estimation and the estimation fails In this case, the adaptation process should be stopped by some type of double-talk detection technique [8, 9]. When the room transfer function changes in the barge-in state, the elimination performance degrades

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.