Bengali Hate Speech Detection in Public Facebook Pages

Nafisa Hasan Tuli,Ranit Debnath Akash,Nasif Istiak Remon

doi:10.1109/iciset54810.2022.9775900

Abstract

Hate speech is a form of negative communication intended to harm people and communities. Hate speech is quite common in the real world, and it has reached alarming proportions on social media as well. These days our lives have become increasingly reliant on social media platforms, such as Facebook. This is due to the rapid advancement of technology and communication. In Bangladesh, the number of people using social media platforms is also rapidly increasing. In English, detecting hate speech on social media is a difficult task. Comparatively, Bengali is a complicated language with few datasets available. As a result, detection of Bengali hate speech becomes even more challenging. In this paper, we present a new dataset of 10,133 user comments. We have collected them from the comment section of various public Facebook pages. We explore the performance of various machine learning and deep learning models in detecting hate speech. Bengali pre-trained word embeddings from fastText are used to train the models. We are especially interested in Convolutional Neural Network (CNN). To our knowledge it was never used for hate speech detection in binary classification. Another goal of this research is to create a new and large dataset, which will facilitate further research of Bengali Hate Speech Detection. All machine learning and deep learning models performed very well from our experiments. But, Support Vector Machine (SVM) is the one that performed the best among them.

Full Text