Abstract

Emotion recognition in conversations is a challenging task that has recently gained popularity due to its potential applications. Until now, however, a large-scale multimodal multi-party emotional conversational database containing more than two speakers per dialogue was missing. Thus, we propose the Multimodal EmotionLines Dataset (MELD), an extension and enhancement of EmotionLines. MELD contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends. Each utterance is annotated with emotion and sentiment labels, and encompasses audio, visual and textual modalities. We propose several strong multimodal baselines and show the importance of contextual and multimodal information for emotion recognition in conversations. The full dataset is available for use at http://affective-meld.github.io.

Highlights

  • With the rapid growth of Artificial Intelligence (AI), multimodal emotion recognition has become a major research topic, primarily due to its potential applications in many challenging tasks, such as dialogue generation, user behavior understanding, multimodal interaction, and others

  • The remainder of the paper is organized as follows: Section 2 illustrates the EmotionLines dataset; we present Multimodal EmotionLines Dataset (MELD) in Section 3; strong baselines and experiments are elaborated in Section 4; future directions and applications of MELD are covered in Section 5 and 6, respectively; Section 7 concludes the paper

  • We introduced MELD, a multimodal multi-party conversational emotion recognition dataset

Read more

Summary

Introduction

With the rapid growth of Artificial Intelligence (AI), multimodal emotion recognition has become a major research topic, primarily due to its potential applications in many challenging tasks, such as dialogue generation, user behavior understanding, multimodal interaction, and others. A conversational emotion recognition system can be used to generate appropriate responses by analyzing user emotions (Zhou et al, 2017; Rashkin et al, 2018). Recent work proposes solutions based on multimodal memory networks (Hazarika et al, 2018). They are mostly limited to dyadic conversations, and not scalable to ERC with multiple interlocutors. This calls for a multi-party conversational data resource that can encourage research in this direction

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.