Spontaneous speech dialogue system TOSBURG II—the user‐centered multimodal interface

Yoichi Takebayashi

doi:10.1002/scj.4690261407

Abstract

AbstractThis paper considers the user‐centered spontaneous speech dialogue system TOSBURG‐II (Taskoriented dialogue system based on speech understanding and response generation), and discusses the design from the viewpoint of media technology and a multimodal interface. The authors have developed element techniques, including spontaneous speech understanding, user‐center dialogue control, multimodal response generation and speech response cancelling, all based on the noise‐immune word‐spotting and key‐words. The concept is that “no constraint is imposed on the user.” By integrating these techniques, the realtime speech dialogue system for an unspecified user is developed. The speech dialogue data acquisition/ evaluation system is constructed on the real system. The system can record real speech data as well as the intermediate result of processing in the dialogue system, such as keyword spotting, speech understanding and dialogue processing. The system can also be utilized in the construction of the speech dialogue corpus and the evaluation/improvement of the human factor aspect, in addition to the evaluation of the system performance. As a result of trial use and evaluation experiment for the real system for unspecified users, it is verified that spontaneous speech understanding based on the interruption function by the user and the multimodal response and keywords, is useful in improving the naturalness of the dialogue and robustness.

Full Text