Constructing a Heterogeneous Training Dataset for Emotion Classification

Anchal Gupta,Satish Mahadevan Srinivasan

doi:10.1016/j.procs.2020.02.259

Anchal Gupta, Satish Mahadevan Srinivasan

Open Access

https://doi.org/10.1016/j.procs.2020.02.259

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2020
Citations: 2	License type: cc-by-nc-nd

Affiliation: Pennsylvania State University

Abstract

Abstract Emotion classification deals with identifying emotions expressed within a text. Social media is generating a vast amount of emotion rich data in the form of tweets, status updates, blog posts etc. Tweets are a good representative of emotions a person usually expresses publicly. By analyzing the emotions in these tweets, one can get an idea of how a person feels about the subject they are referring to. Machine Learning (ML) techniques are widely used for analyzing emotions within the tweets. However, there are no balanced training datasets that can be used for training the ML classifiers. As a result, supervised classifiers demonstrate a poor performance with classifying emotions within texts particularly within the tweets. In addition to that, none of the available datasets are useful for training the classifiers to identify and classify emotions within the tweets. Therefore, in this paper we have proposed a novel approach for constructing a balanced heterogeneous training dataset for emotion classification of the tweets. Using the lexicon-based NRC classifier we have classified the textual instances in to four different emotions such as joyful, sad, angry and surprise. Using this as a training dataset we have trained six different machine learning models including the Multiclass Logistic Regression (MLR), Multinomial Naive Bayes (MLB), Random Forest (RF), Support Vector Machine (SVM), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN). Our study reveals that this approach has the potentiality in boosting the performance of the supervised classifiers for emotion classification within the tweets.

Full Text