Background: Reddit comments are a valuable source of natural language data where emotion plays a key role in human communication. However, emotion recognition is a difficult task that requires understanding the context and sentiment of the texts. In this paper, we aim to compare the effectiveness of four recurrent neural network (RNN) models for classifying the emotions of Reddit comments. Methods: We use a small dataset of 4,922 comments labeled with four emotions: approval, disapproval, love, and annoyance. We also use pre-trained Glove.840B.300d embeddings as the input representation for all models. The models we compare are SimpleRNN, Long ShortTerm Memory (LSTM), bidirectional LSTM, and Gated Recurrent Unit (GRU). We experiment with different text preprocessing steps, such as removing stopwords and applying stemming, removing negation from stopwords, and the effect of setting the embedding layer as trainable on the models. Results: We find that GRU outperforms all other models, achieving an accuracy of 74%. Bidirectional LSTM and LSTM are close behind, while SimpleRNN performs the worst. We observe that the low accuracy is likely due to the presence of sarcasm, irony, and complexity in the texts. We also notice that setting the embedding layer as trainable improves the performance of LSTM but increases the computational cost and training time significantly. We analyze some examples of misclassified texts by GRU and identify the challenges and limitations of the dataset and the models Conclusion: In our study GRU was found to be the best model for emotion classification of Reddit comments among the four RNN models we compared. We also discuss some future directions for research to improve the emotion recognition task on Reddit comments. Furthermore, we provide an extensive discussion of the applications and methods behind each technique in the context of the paper.
Read full abstract