Abstract
Social media sites, which became central to our everyday lives, enable users to freely express their opinions, feelings, and ideas due to a certain level of depersonalization and anonymity they provide. If there is no control, these platforms may be used to propagate hate speech. In fact, in recent years, hate speech has increased on social media. Therefore, there is a need to monitor and prevent hate speech on these platforms. However, manual control is not feasible due to the high traffic of content production on social media sites. Moreover, the language used and the length of the messages provide a challenge when using classical machine learning approaches as prediction methods. This paper presents a genetic programming (GP) model for detecting hate speech where each chromosome represents a classifier employing a universal sentence encoder as a feature. A novel mutation technique that affects only the feature values in combination with the standard one-point mutation technique improved the performance of the GP model by enriching the offspring pool with alternative solutions. The proposed GP model outperformed all state-of-the-art systems for the four publicly available hate speech datasets.
Highlights
With the evolution of technology and widespread usage of social media platforms, users can express their feelings and opinions without any limitations or restrictions, which in some cases may initiate and proliferate hate towards others
The experiments were implemented using Python 3.7, and the distributed evolutionary algorithm framework DEAP library was used for creating the genetic programming (GP) model
We experimented with three different settings to assess the contribution of the novel hybrid mutation operator on performance
Summary
With the evolution of technology and widespread usage of social media platforms, users can express their feelings and opinions without any limitations or restrictions, which in some cases may initiate and proliferate hate towards others. Even though all the major social media platforms are trying to detect and prevent hate speech, they mainly rely on reports of such actions from users. Since each platform defines it from its own perspective, there exist slightly varying definitions of hate speech in social media. Hate speech detection can be defined as either binary or multiclass classification tasks. A document is classified into one of two classes as Hate and Not Hate in the binary classification task. Various machine learning approaches, including Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbors (KNN), Naïve Bayes (NB), Decision Trees (DT), and Convolutional Neural Network (CNN), have been employed to implement classifiers for hate speech detection [2]. Ensemble models [3] and deep learning have been used as well [4]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have