Detecting cyberbullying text using the approaches with machine learning models for the low-resource Bengali language

Md Nesarul Hoque,Md Hanif Seddiqui

doi:10.11591/ijai.v13.i1.pp358-367

Md Nesarul Hoque, Md Hanif Seddiqui

Open Access

PDF Available

https://doi.org/10.11591/ijai.v13.i1.pp358-367

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The rising usage of social media sites and the advances in communication technologies have led to a considerable increase in cyberbullying events. Here, people are intimidated, harassed, and humiliated via digital messaging. To identify cyberbullying texts, several research have been undertaken in English and other languages with abundant resources, but relatively few studies have been conducted in low-resource languages like Bengali. This research focuses on Bengali text to find cyberbullying material by experimenting with pre-processing, feature selection, and three types of machine learning (ML) models: classical ML, deep learning (DL), and transformer learning. In classical ML, four models, support vector machine (SVM), multinomial Naive Bayes (MNB), random forest (RF), and logistic regression (LR) are used. In DL, three models, long short term memory (LSTM), Bidirectional LSTM, and convolutional neural network with bidirectional LSTM (CNN-BiLSTM) are employed. As the transformerbased pre-trained model, bidirectional encoder representations from transformers (BERT) is utilized. Using our proposed pre-processing tasks, the MNB-based approach achieves the best accuracy of 78.816% among the other classical ML models, the LSTM-based approach gains the highest result of 77.804% accuracy among the DL models, and the BERT-based approach outperforms both with 80.165% accuracy.

Full Text