Abstract
Modeling multimodal language is a core research area in natural language processing. While languages such as English have relatively large multimodal language resources, other widely spoken languages across the globe have few or no large-scale datasets in this area. This disproportionately affects native speakers of languages other than English. As a step towards building more equitable and inclusive multimodal systems, we introduce the first large-scale multimodal language dataset for Spanish, Portuguese, German and French. The proposed dataset, called CMU-MOSEAS (CMU Multimodal Opinion Sentiment, Emotions and Attributes), is the largest of its kind with 40, 000 total labelled sentences. It covers a diverse set topics and speakers, and carries supervision of 20 labels including sentiment (and subjectivity), emotions, and attributes. Our evaluations on a state-of-the-art multimodal model demonstrates that CMU-MOSEAS enables further research for multilingual studies in multimodal language.
Highlights
Humans use a coordinated multimodal signal to communicate with each other
As Artificial Intelligence Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 1801–1812, November 16–20, 2020. c 2020 Association for Computational Linguistics (AI) increasingly blends into everyday life across the globe, there is a genuine need for intelligent entities capable of understanding multimodal language in different cultures
We believe that data of this scale presents a step towards learning human communication at a more fine-grained level, with the longterm goal of building more equitable and inclusive NLP systems
Summary
Humans use a coordinated multimodal signal to communicate with each other. While English, Chinese, and Spanish languages have resources for computational analysis of multimodal language (focusing on analysis of sentiment, subjectivity, or emotions (Yu et al, 2020; Poria et al, 2019; Zadeh et al, 2018b; Park et al, 2014; Wollmer et al, 2013; Poria et al, 2020)), other commonly spoken languages across the globe lag behind. We introduce a large-scale dataset for 4 languages of Spanish, Portuguese, German and French. The dataset, called CMU-MOSEAS (CMU Multimodal Opinion Sentiment, Emotions and Attributes) contains 10, 000 annotated sentences from across a wide variety of speakers and topics. We believe that data of this scale presents a step towards learning human communication at a more fine-grained level, with the longterm goal of building more equitable and inclusive NLP systems. We experiment with a state-of-the-art multimodal language model, and demonstrate that CMUMOSEAS presents new challenges to the NLP community
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have