Cyberbullying is one of the crimes that arise rapidly through the daily use of technology by different types of people and, most notably, by sharing one’s opinions or feelings on social media in a harmful manner. It has several negative effects on society such as depression, anxiety, suicide, and so on. At the same time, it reduces productivity, causes psychological damage that can last a lifetime and increases violence among people. To prevent cyberbullying or take necessary steps against the harasser, the first step is to detect cyberbullying. Several works exist to detect and classify cyberbullying but a few works have been carried out to classify cyberbullying in the Bengali Language. As the number of people is increased day by day who communicate on social media using the Bengali language, it is crucial to address this situation and improve both accuracy and robustness to detect and classify cyberbullying. For this purpose, we propose an NLP-based model using machine learning and deep learning algorithms to detect and classify Bengali comments on social media. This research specifies cyberbullying comments using a multiclass classification strategy. Kaggle and Melany are used to collect the dataset to train and evaluate our model. The dataset contains 56308 Bengali comments, consisting of four distinct categories. The categories are not bully, trolls, sexual, and threats. We use different machine learning algorithms such as Support Vector Machine, Logistic Regression, Random Forest, XGBOOST, Multinomial Naïve Bayes, Deep learning algorithm, Recurrent Neural Network (RNN), and two fusion models. Along with that effective preprocessing steps are implemented to get a suitable dataset. In this study, the Recurrent Neural Network gives the best accuracy, which is 86%. The accuracy of our model is good enough to help social media users and encourage them to practice morality.
Read full abstract