Abstract

The aim of this paper is to review machine learning (ML) algorithms and techniques for hate speech detection in social media (SM). Hate speech problem is normally model as a text classification task. In this study, we examined the basic baseline components of hate speech classification using ML algorithms. There are five basic baseline components - data collection and exploration, feature extraction, dimensionality reduction, classifier selection and training, and model evaluation, were reviewed. There have been improvements in ML algorithms that were employed for hate speech detection over time. New datasets and different performance metrics have been proposed in the literature. To keep the researchers informed regarding these trends in the automatic detection of hate speech, it calls for a comprehensive and an updated state-of-the-art. The contributions of this study are three-fold. First to equip the readers with the necessary information on the critical steps involved in hate speech detection using ML algorithms. Secondly, the weaknesses and strengths of each method is critically evaluated to guide researchers in the algorithm choice dilemma. Lastly, some research gaps and open challenges were identified. The different variants of ML techniques were reviewed which include classical ML, ensemble approach and deep learning methods. Researchers and professionals alike will benefit immensely from this study.

Highlights

  • Social media networks (SMNs) are the fastest means of communication as messages are sent and received almost instantaneously [1] [2]

  • The authors analyse the features for hate speech detection in literature which includes: simple surface features, word generalization, sentiment analysis, lexical resources, linguistic features, knowledge-based features, meta-information and multimodal information. The limitation of these two reviews is that techniquessuch as deep learning and ensemble approach are not considered in their work

  • We reviewed techniques like deep learning, ensemble learning among others that have been employed for theautomatic detection of hate speech in social media

Read more

Summary

INTRO DUCTIO N

Social media networks (SMNs) are the fastest means of communication as messages are sent and received almost instantaneously [1] [2]. More researches are being conducted to curb with the rising cases of hate speeches in social media (SM). The impacts of hatecrimes are already overwhelming due to widespread adoption of SM [6] and theanonymity enjoyed by the online users [7] In this era of big data, it is timeconsuming and difficult to manually process and classify massive quantities of text data. There have been significant advancements in ML techniques from classical ML, ensemble and deep learning (DL) techniques for hate speech detection. To be able to improve classification of SM texts as hate speech or non-hate speech, researchers and practitioners require an updated understanding of machine learning methodologies, which is fast evolving.

MOTIVATION
RELATED WORKS
METHO DO LO GY
THE CONCEPT OF HATE SPEECH
HATE SPEECH MODELLING
HATE SPEECH CLASSIFICATIO N
DATA COLLECTION AND EXPLORATION
FEATURE EXTRACTION
DIMENSIONALITY REDUCTION
HATE SPEECH CLASSIFIER
Evaluation
PERFORMANCE EVALUATION METRICS FOR HATE SPEECH DETECTION
LIMITATIONS
O PEN CHALLENGES IN HATE SPEECH DETECTIO N
DATASET AND HATE SPEECH DETECTION CHALLENGE
DATA SPARSITY CHALLENGE
UNBALANCED DATASET CHALLENGE
CULTURAL VARIATIONN
PANDEMIC OR NATURAL DISASTER
VIII. LIMITATIO NS O F THE STUDY
CO NCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call