Abstract

In this paper, we present a multimodal dialogue system combining a neural response generation mechanism, a reranking mechanism, and a rule-based avatar control mechanism. Our system was submitted to the open track at the Fifth Dialogue System Live Competition and won second place. Remarkably, our system received the best human evaluation performance for visual information control (i.e. speaking style of avatar) in the preliminary round. The assessment of the competition evaluators revealed that our system generates natural utterances appropriate to the conversational context and topic with an appealing speaking style. Through the analysis, we found that our devices, such as post-processing on speech recognition results and the final response selection method, are effective, but we also found room for improvement, such as speech recognition errors and challenges in the reranking module.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call