Abstract

The proliferation of toxic online content has become a significant concern in today’s digital landscape, fueled by the widespread use of the internet among individuals from diverse cultural and educational backgrounds. One of the central challenges in the automated identification of harmful text content lies in distinguishing hate speech from offensive language. In this research paper, we undertake a comprehensive examination of two primary modeling approaches for hate speech detection. Leveraging the Twitter dataset, we conduct experiments that involve the utilization of n-grams as distinctive features, subsequently subjecting their term frequency-inverse document frequency (TFIDF) values to various machine learning models. A comparative analysis is conducted across 5 models among which, Logistic Regression and Gradient Boosting produce the best results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call