Improvement of Accuracy for Hate Speech Detection Using Modified Feature Extraction

Ishan Bansal ,Mehak Sood

doi:10.36948/ijfmr.2023.v05i05.8248

Abstract

The proliferation of toxic online content has become a significant concern in today’s digital landscape, fueled by the widespread use of the internet among individuals from diverse cultural and educational backgrounds. One of the central challenges in the automated identification of harmful text content lies in distinguishing hate speech from offensive language. In this research paper, we undertake a comprehensive examination of two primary modeling approaches for hate speech detection. Leveraging the Twitter dataset, we conduct experiments that involve the utilization of n-grams as distinctive features, subsequently subjecting their term frequency-inverse document frequency (TFIDF) values to various machine learning models. A comparative analysis is conducted across 5 models among which, Logistic Regression and Gradient Boosting produce the best results.

Full Text