Abstract

There are many ways people express their reactions in the media. Text data is one of them, for example, comments, reviews, blog posts, messages, etc. Analysis of emotions expressed there is in high demand nowadays for various purposes. This research provides a method of performing sentiment analysis of text information using machine learning. The authors trained a classifier based on the BERT encoder, which recognizes emotions in text messages in English written in chat style. To handle raw chat-style messages, authors developed an enhanced text standardization layer. The list of emotions identified includes admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, and surprise. The model solves the problem of multiclass multilabel text classification, which means that more than one class can be predicted from one piece of text. The authors trained the model on the GoEmotions dataset, which consists of 54,263 text comments from Reddit. The model reached a macro-averaged F1-Score of 0.50704 in emotions prediction and 0.7349 in sentiments prediction on the testing dataset. The presented model increased the quality of emotions prediction by 10.2% and sentiments prediction by 6.5% in comparison to the baseline approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call