Abstract

Textual response generation is a pivotal yet challenging task for multimodal task-oriented dialog systems, which targets at generating the appropriate textual response given the multimodal context. Although existing efforts have obtained remarkable advancements, they ignore the potential of the domain information in revealing the key points of the user intention and the user's history dialogs in indicating the user's characteristics. To address this issue, in this work, we propose a novel domain-aware multimodal dialog system with distribution-based user characteristic modeling (named DMDU). In particular, DMDU contains three vital components: context-knowledge embedding extraction , domain-aware response generation and distribution-based user characteristic injection . Specifically, the context-knowledge embedding extraction component aims to extract the embedding of multimodal context and related knowledge following existing studies. The domain-aware response generation component targets at conducting domain-aware fine-grained intention modeling based on the context and knowledge embedding, and thus fulfills the textual response generation. Moreover, the distribution-based user characteristic injection component first captures the user's characteristics and current intention with the Gaussian distribution, and then conducts the sampling-based contrastive semantic regularization to promote the context representation learning. Experimental results on the public dataset demonstrate the effectiveness of DMDU. We release codes to promote other researchers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.