EmoInt-Trans: A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations

Gopendra Vikram Singh,Mauajama Firdaus,Pushpak Bhattacharyya,Asif Ekbal

doi:10.1109/taslp.2022.3224287

Abstract

In the natural language processing community, open-domain conversational agents, also known as chatbots, are gaining popularity. One of the difficulties is getting them to communicate in an emotionally intelligent manner. To generate dialogues, current neural response generation methods depend solely on end-to-end learning from large scale conversation data. Therefore, we introduce a large-scale multi Emotion and Intent guided Multimodal Dialogue (EmoInt-MD) dataset labelled with 32 emotions and 15 empathetic intents having 32 k dialogues taken from different movie genres. We propose a novel multi-task multimodal contextual Transformer framework for simultaneously identifying the emotions and intents in a given utterance utilizing audio and visual features in addition to the textual information. Experimental analysis proves that the proposed framework outperforms several unimodal and multimodal baselines on the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">EmoInt-MD</i> dataset. This dataset along with our baseline and proposed framework implementations will be made publicly available for research purposes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

EmoInt-Trans: A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2023
Citations: 8

Similar Papers

Multimodal PLSA for Movie Genre Classification
Hao-Zhi Hong ... Jen-Ing G Hwang
-
Hao-Zhi Hong, et. al.Hao-Zhi Hong ... Jen-Ing G Hwang
01 Jan 2015
01 Jan 2015

Analysis of correlation between audio and visual speech features for clean audio feature prediction in noise
Ibrahim Almajai ... Jonathan Darch
-
Ibrahim Almajai, et. al.Ibrahim Almajai ... Jonathan Darch
17 Sep 2006
17 Sep 2006

End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis
Muhammad Muzammel ... Alice Othmani
Computer Methods and Programs in Biomedicine | VOL. 211
Muhammad Muzammel, et. al.Muhammad Muzammel ... Alice Othmani
28 Sep 2021
Computer Methods and Programs in Biomedicine | VOL. 211

A Taxonomy of Empathetic Response Intents in Human Social Conversations
Anuradha Welivita ... Pearl Pu
-
Anuradha Welivita, et. al.Anuradha Welivita ... Pearl Pu
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EmoInt-Trans: A Multimodal Transformer for Identifying Emotions and Intents in Social Conversations

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing