Abstract
Research on cyberbullying detection is gaining increasing attention in recent years as both individual victims and societies are greatly affected by it. Moreover, ease of access to social media platforms such as Facebook, Instagram, Twitter, etc. has led to an exponential increase in the mistreatment of people in the form of hateful messages, bullying, sexism, racism, aggressive content, harassment, toxic comment etc. Thus there is an extensive need to identify, control and reduce the bullying contents spread over social media sites, which has motivated us to conduct this research to automate the detection process of offensive language or cyberbullying. Our main aim is to build single and double ensemble-based voting model to classify the contents into two groups: `offensive' or `non-offensive'. For this purpose, we have chosen four machine learning classifiers and three ensemble models with two different feature extraction techniques combined with various n-gram analysis on a dataset extracted from Twitter. In our work, Logistic Regression and Bagging ensemble model classifier have performed individually best in detecting cyberbullying which has been outperformed by our proposed SLE and DLE voting classifiers. Our proposed SLE and DLE models yield the best performance of 96% when TF-IDF (Unigram) feature extraction is applied with K-Fold cross-validation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.