Abstract

Enormous amounts of data are generated in the form of feedback or comments from online platforms such as social media, e-commerce, education, and programming. This feedback and comments hold significant value for making important strategic decisions; therefore, effectively analyzing them poses a major challenge. This research addresses the imperative need for an efficient comment classification model. To fill this research gap, we propose a robust ensemble machine learning (ML) model called CommentClass (RF+AdaBoost+SVM+Soft-Voting), specifically designed for the comment classification task. First, we developed eight (08) pipelines using various combinations of ML algorithms. Next, the fundamental ensemble techniques such as stacking, blending, hard-voting, soft-voting, and averaging are incorporated into these pipelines to improve comment classification performance. These ensemble models are able to discern the latent characteristics of diverse text comments, classifying them to achieve superior accuracy. The proposed CommentClass ensemble model achieved an impressive accuracy and F1-score of approximately 98% for comment classification on the YouTube dataset. This result represents an improvement in accuracy by approximately +3% compared to prior research on the same dataset. Moreover, the proposed CommentClass model obtained higher F1-scores of 90.26%, 87.04%, and 75.74%, on the Spambase, IMDB, and Twitter datasets, respectively, compared to other sophisticated models. Furthermore, the proposed CommentClass model exhibited significant accuracy on the SMS dataset and two distinct synthetic datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call