Abstract
This research aims to compare the performance of three classification algorithms, namely Random Forest, XGBoost, and LightGBM, in classifying emotions in Reddit comments. Emotion classification in Reddit comments is a complex classification problem due to its numerous variations and ambiguities. This research utilizes the GoEmotions Fine-Grained dataset, filtered down to 7,325 Reddit comments with 5 different basic emotion labels. In this study, data preprocessing steps, feature extraction using CountVectorizer and TF-IDF, and hyperparameter tuning using GridSearchCV for each algorithm are conducted. Subsequently, model evaluation is performed using Cross-Validation and confusion matrix. The results of the study indicate that Random Forest outperforms the XGBoost and LightGBM algorithm with an accuracy of 75.38% compared to XGBoost with 69.05% accuracy and LightGBM with 66.63% accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.