Zero-shot multitask intent and emotion prediction from multimodal data: A benchmark study

Gopendra Vikram Singh,Mauajama Firdaus,Dushyant Singh Chauhan,Asif Ekbal,Pushpak Bhattacharyya

doi:10.1016/j.neucom.2023.127128

Abstract

Empathy involves comprehending and sharing the emotions of another person. In the realm of conversational AI, empathy pertains to the AI’s capacity to understand and respond suitably to the user’s emotions and needs. Conversational AI with empathetic capabilities can heighten the user experience by making interactions more personalized and natural. At present, machine learning algorithms are commonly utilized in existing conversational AI systems to recognize emotions and corresponding empathetic intents from annotated data. Nonetheless, this approach is not without limitations, being expensive and time-consuming. Our present work takes a holistic approach to empathy in conversational AI, where we propose a novel zero-shot multitask framework, the Zero-shot Intent Emotion Detection (ZIED) network, identifies both emotions and intents in a multimodal setting. We developed an end-to-end model that concurrently captures textual, audio, and visual representations and integrates the different modalities using cross-attention mechanisms. Our experimental results, based on the EmoInt-MD dataset, show that incorporating all three modalities results in the best performance for both emotion and empathetic intent detection. We observed a noteworthy improvement of over 6% and 4% for intent and emotion, respectively, for various ratios of seen and unseen classes.

Full Text