Abstract

Racist and ethnic violence, fabricated persecution, and some form of intimidation are all risks associated with hate speech, which is a concern with natural language processing. Given the sensitivity of hate speech in our society, it is essential to classify speeches into hate and non-hate categories in real time to minimize its risks. The main objective of this work is to investigate selected supervised machine learning algorithm model for the classification of hate speech on social media. The term frequency-inverse document frequency (TF-IDF) and bag of words (BOW) models were used by the model to extract features. Porter’s stemming model and WordNet for lemmatization are used during the preprocessing step. The datasets were trained using logistic regression, naive Bayes, and random forest, and logistic regression was also utilized to create the classifier. For training purpose, 80% of the datasets was used to train the model and 20% was used for testing the model. Results obtained from the application of Logistic Regression algorithm revealed 98% accuracy and 98% F1-score. These scores indicate high accuracy in hate speech detection and classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call