Abstract

Modeling multimodal language is a core research area in natural language processing. While languages such as English have relatively large multimodal language resources, other widely spoken languages across the globe have few or no large-scale datasets in this area. This disproportionately affects native speakers of languages other than English. As a step towards building more equitable and inclusive multimodal systems, we introduce the first large-scale multimodal language dataset for Spanish, Portuguese, German and French. The proposed dataset, called CMU-MOSEAS (CMU Multimodal Opinion Sentiment, Emotions and Attributes), is the largest of its kind with 40, 000 total labelled sentences. It covers a diverse set topics and speakers, and carries supervision of 20 labels including sentiment (and subjectivity), emotions, and attributes. Our evaluations on a state-of-the-art multimodal model demonstrates that CMU-MOSEAS enables further research for multilingual studies in multimodal language.

Highlights

  • Humans use a coordinated multimodal signal to communicate with each other

  • As Artificial Intelligence Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 1801–1812, November 16–20, 2020. c 2020 Association for Computational Linguistics (AI) increasingly blends into everyday life across the globe, there is a genuine need for intelligent entities capable of understanding multimodal language in different cultures

  • We believe that data of this scale presents a step towards learning human communication at a more fine-grained level, with the longterm goal of building more equitable and inclusive NLP systems

Read more

Summary

Introduction

Humans use a coordinated multimodal signal to communicate with each other. While English, Chinese, and Spanish languages have resources for computational analysis of multimodal language (focusing on analysis of sentiment, subjectivity, or emotions (Yu et al, 2020; Poria et al, 2019; Zadeh et al, 2018b; Park et al, 2014; Wollmer et al, 2013; Poria et al, 2020)), other commonly spoken languages across the globe lag behind. We introduce a large-scale dataset for 4 languages of Spanish, Portuguese, German and French. The dataset, called CMU-MOSEAS (CMU Multimodal Opinion Sentiment, Emotions and Attributes) contains 10, 000 annotated sentences from across a wide variety of speakers and topics. We believe that data of this scale presents a step towards learning human communication at a more fine-grained level, with the longterm goal of building more equitable and inclusive NLP systems. We experiment with a state-of-the-art multimodal language model, and demonstrate that CMUMOSEAS presents new challenges to the NLP community

Related Resources
Computational Models of Multimodal Language
Acquisition and Verification
Labels
Privacy and Ethics
Annotator Selection
Label Statistics
Multimodal Feature Extraction
Findings
Experimental Baselines
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call