Multi-Label Emotion Classification on Code-Mixed Text: Data and Methods

Grigori Sidorov,Rao Muhammad Adeel Nawab,Helena Gomez-Adorno,Iqra Ameer

doi:10.1109/access.2022.3143819

Grigori Sidorov, Rao Muhammad Adeel Nawab + Show 2 more

Open Access

https://doi.org/10.1109/access.2022.3143819

Copy DOI

Abstract

The multi-label emotion classification task aims to identify all possible emotions in a written text that best represent the author’s mental state. In recent years, multi-label emotion classification attracted the attention of researchers due to its potential applications in e-learning, health care, marketing, etc. There is a need for standard benchmark corpora to develop and evaluate multi-label emotion classification methods. The majority of benchmark corpora were developed for the English language (monolingual corpora) using tweets. However, the multi-label emotion classification problem is not explored for code-mixed text, for example, English and Roman Urdu, although the code-mixed text is widely used in Facebook posts/comments, tweets, SMS messages, particularly by the South Asian community. For filling this gap, this study presents a large benchmark corpus for the multi-label emotion classification task, which comprises 11,914 code-mixed (English and Roman Urdu) SMS messages. Each code-mixed (English and Roman Urdu) SMS message manually annotated using a set of 12 emotions, including anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust, and neutral (no emotion). As a secondary contribution, we applied and compared state-of-the-art classical machine learning (content-based methods – three word n-gram features and eight character n-gram features), deep learning (CNN, RNN, Bi-RNN, GRU, Bi-GRU, LSTM, and Bi-LSTM), and transfer learning-based methods (BERT and XLNet) on our proposed corpus. After our extensive experimentation, the best results were obtained using state-of-the-art classical machine learning methods on word uni-gram (Micro Precision = 0.67, Micro Recall = 0.54, Micro F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> = 0.67) with a combination of OVR multi-label and SVC single-label machine learning algorithms. Our proposed corpus is free and publicly available for research purposes to foster research in an under-resourced language (Roman Urdu).

Highlights

A single piece of text may contain one or more emotions
The ‘‘DL Algorithms’’ means deep learning algorithms used in this study such as Long short-term memory (LSTM), Bidirectional Long short-term memory (Bi-LSTM), Gated Neural Networks (GNN), Bidirectional Gated Neural Networks (Bi-GNN), Recurrent Neural Networks (RNN), Bidirectional Recurrent Neural Networks (Bi-RNN), and Convolutional Neural Network (CNN)
The results show that word 2-grams are the most suitable features when we combine English and Roman Urdu SMS messages for training (Train-E-CM-MEC-21 + Train-RUCM-MEC-21) and test them on the code-mixed SMS messages data (Test-CM-CM-MEC-21)

Summary

Introduction

A single piece of text may contain one or more emotions. A single-label emotion classification task aims to predict only one emotion of a text. The main drawback of single-label emotion classification is that it only captures one emotion in a given text, making it difficult to completely understand the author’s emotional state. Multi-label emotion classification overcomes this limitation by capturing all possible emotions. Multi-label emotion classification has potential applications in various domains. In E-learning, multi-label emotion classification can adjust the learning techniques in conformity with the learner. Multi-label emotion classification can be helpful in health care to determine the feelings and comfort level of the patient towards the treatment. Multi-label emotion classification can be used, for example, in stock market monitoring or prioritizing calls in a call center

Objectives

Methods

Results

Conclusion