Abstract
Social media platforms and microblogging websites have gained accelerated popularity during the past few years. These platforms are used for expressing views and opinions about products, personalities, and events. Often during discussions and debates, fights take place on social media platforms which involves using rude, disrespectful, and hateful comments called toxic comments. The identification of toxic comments has been regarded as an essential element for social media platforms. This study introduces an ensemble approach, called regression vector voting classifier (RVVC), to identify the toxic comments on social media platforms. The ensemble merges the logistic regression and support vector classifier under soft voting criteria. Several experiments are performed on the imbalanced and balanced dataset to analyze the performance of the proposed approach. For data balance, the synthetic minority oversampling technique (SMOTE) is used on the imbalanced dataset. Furthermore, two feature extraction approaches are utilized to investigate their suitability such as term frequency-inverse document frequency (TF-IDF) and bag-of-words (BoW). The performance of the proposed approach is compared with several machine learning classifiers using accuracy, precision, recall, and F1-score. Results suggest that RVVC outperforms all other individual models when TF-IDF features are used with SMOTE balanced dataset and achieves an accuracy of 0.97.
Highlights
Social media platforms and microblogging websites have gained accelerated popularity for social communication between individuals and groups
Results suggest that regression vector voting classifier (RVVC) gives the highest number of correct predictions when used with term frequency-inverse document frequency (TF-Inverse Document Frequency (IDF)) features from synthetic minority oversampling technique (SMOTE) over-sampled dataset
This study analyzes the performance of various machine learning models to perform toxic comments classification and proposes an ensemble approached called RVVC
Summary
Social media platforms and microblogging websites have gained accelerated popularity for social communication between individuals and groups. Through these platforms, people share their thoughts, ideas, opinions and express their feelings using comments and feedback [1]. Text in online comments contain many hazards such as fake news, cyberbullying, online harassment and toxicity [4]. These toxic comments have become a serious issue that affects the reputation of social platforms and cause different psychological problems for users, such as depression, frustration, and even suicidal thoughts [1].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.