Abstract

The rising usage of social media sites and the advances in communication technologies have led to a considerable increase in cyberbullying events. Here, people are intimidated, harassed, and humiliated via digital messaging. To identify cyberbullying texts, several research have been undertaken in English and other languages with abundant resources, but relatively few studies have been conducted in low-resource languages like Bengali. This research focuses on Bengali text to find cyberbullying material by experimenting with pre-processing, feature selection, and three types of machine learning (ML) models: classical ML, deep learning (DL), and transformer learning. In classical ML, four models, support vector machine (SVM), multinomial Naive Bayes (MNB), random forest (RF), and logistic regression (LR) are used. In DL, three models, long short term memory (LSTM), Bidirectional LSTM, and convolutional neural network with bidirectional LSTM (CNN-BiLSTM) are employed. As the transformer-based pretrained model, bidirectional encoder representations from transformers (BERT) is utilized. Using our proposed pre-processing tasks, the MNB-based approach achieves the best accuracy of 78.816% among the other classical ML models, the LSTM-based approach gains the highest result of 77.804% accuracy among the DL models, and the BERT-based approach outperforms both with 80.165% accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call