Abstract
In this paper, we present a multimodal dialogue system combining a neural response generation mechanism, a reranking mechanism, and a rule-based avatar control mechanism. Our system was submitted to the open track at the Fifth Dialogue System Live Competition and won second place. Remarkably, our system received the best human evaluation performance for visual information control (i.e. speaking style of avatar) in the preliminary round. The assessment of the competition evaluators revealed that our system generates natural utterances appropriate to the conversational context and topic with an appealing speaking style. Through the analysis, we found that our devices, such as post-processing on speech recognition results and the final response selection method, are effective, but we also found room for improvement, such as speech recognition errors and challenges in the reranking module.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have