Transformer-Based Multimodal Infusion Dialogue Systems

Bo Liu,Lejian He,Weijian Ruan,Yafei Liu,Yuejia Xiang,Li Zhu,Tianyao Yu

doi:10.3390/electronics11203409

Abstract

The recent advancements in multimodal dialogue systems have been gaining importance in several domains such as retail, travel, fashion, among others. Several existing works have improved the understanding and generation of multimodal dialogues. However, there still exists considerable space to improve the quality of output textual responses due to insufficient information infusion between the visual and textual semantics. Moreover, the existing dialogue systems often generate defective knowledge-aware responses for tasks such as providing product attributes and celebrity endorsements. To address the aforementioned issues, we present a Transformer-based Multimodal Infusion Dialogue (TMID) system that extracts the visual and textual information from dialogues via a transformer-based multimodal context encoder and employs a cross-attention mechanism to achieve information infusion between images and texts for each utterance. Furthermore, TMID uses adaptive decoders to generate appropriate multimodal responses based on the user intentions it has determined using a state classifier and enriches the output responses by incorporating domain knowledge into the decoders. The results of extensive experiments on a multimodal dialogue dataset demonstrate that TMID has achieved a state-of-the-art performance by improving the BLUE-4 score by 13.03, NIST by 2.77, image selection Recall@1 by 1.84%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Oct 20, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Transformer-Based Multimodal Infusion Dialogue Systems

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model
Xiaolin Chen ... Xuemeng Song
ACM Transactions on Information Systems | VOL. 42
Xiaolin Chen, et. al.Xiaolin Chen ... Xuemeng Song
07 Nov 2023
ACM Transactions on Information Systems | VOL. 42

User Attention-guided Multimodal Dialog Systems
Chen Cui ... Xuemeng Song
-
Chen Cui, et. al.Chen Cui ... Xuemeng Song
18 Jul 2019
18 Jul 2019

A Model for Multimodal Dialogue System Output Applied to an Animated Talking Head
Jonas Beskow ... Magnus Nordstrand
-
Jonas Beskow, et. al.Jonas Beskow ... Magnus Nordstrand
01 Jan 2004
01 Jan 2004

Specification and realisation of multimodal output in dialogue systems
Jonas Beskow ... Jens Edlund
-
Jonas Beskow, et. al.Jonas Beskow ... Jens Edlund
16 Sep 2002
16 Sep 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Transformer-Based Multimodal Infusion Dialogue Systems

Abstract

Talk to us

Similar Papers

More From: Electronics