Abstract

Multimodal dialogue systems (MDSs) are needed to allow users to converse with virtual agents that use natural language by sensing the multimodal behavior of users. One crucial step in the development of an MDS is measuring how well the dialogue system performs. Though previous research focused on the user satisfaction modeling from linguistic modality in text-to-text dialogue systems, the user satisfaction is observed by not only spoken dialogue contents but also the acoustic and visual nonverbal behaviors of users. Multimodal social signal sensing provides a solution that automatically measures dialogue systems based on subjective evaluation. With this background, we proposed a multimodal recognition model of the user using sequence modeling algorithms (RNN, LSTM, and GRU). It is a novel challenge to recognize the user satisfaction label at the dialogue level. Each label was annotated by the user based on the overall dialogue. We extracted both verbal features and nonverbal features at the exchange level (the unit is a pair of system and user utterances) and analyzed the contributions of multimodal features and unimodal features to recognize user satisfaction labels at the dialogue level. We used a multimodal user-system dialogue data corpus with user satisfaction labels at the dialogue level. To validate the recognition accuracy of the proposed multimodal modeling approach, we compared the proposed method with two models based on human perception by external human coders and the system operator (called “Wizard”) with whom the user talks. The experimental results showed that the multimodal model achieved a better performance in both classification and regression tasks. The results indicated that the performance of the multimodal model was higher than that of the human models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.