Abstract

Nowadays Cyberbullying on social media has become a major problem. Cyberbullying may cause many serious and negative mental, emotional and physical impacts on a person's life. However, Cyberbullying leaves a record that can demonstrate value and give proof to help stop digital abuse. The early detection of Cyberbullying on social media becomes crucial to moving the effect on the social media user. Numerous studies are being done to automatically identify cyberbullying content in this trend. The absence of linguistic resources, especially for recently developed languages, is the main issue and gap in Cyberbullying detection measures. Romanized Urdu is a recently developed and frequently used language in Asian nations on social networking platforms. Using Machine Learning or Deep Learning with Natural Language Processing (NLP) techniques to automatically detect cyberbullying is the best way to stop it. Current research develops an efficient framework to detect Cyberbullying, using NLP tools with Machine Learning models. Using different preprocessing techniques, the proposed study is validated on a roman-Urdu-abusive-comment-detector (RUACD) dataset. Five machine learning models Support Vector Machine (SVM), Naïve Bayes (NB), Logistic Regression (LR), Random Forest (RF), and Decision Tree (DT) are evaluated on the RUACD dataset. From experiments, the current study finds that the SVM, LR, and DT outperformed and achieved promising results. In last, an ensemble of these outperformed models is formed and achieved 95.9% of test accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.