Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques

Sudhir Kumar Mohapatra,Kathiravan Srinivasan,Yuh-Chung Hu,Srinivas Prasad,Tapan Kumar Das,Dwiti Krishna Bebarta

doi:10.3390/app11188575

Abstract

Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In this study, the authors collected the hate speech of English-Odia code mixed data from a Facebook public page and manually organized them into three classes. In order to build binary and ternary datasets, the data are further converted into binary classes. The modeling of hate speech employs the combination of a machine learning algorithm and features extraction. Support vector machine (SVM), naïve Bayes (NB) and random forest (RF) models were trained using the whole dataset, with the extracted feature based on word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), combined n-grams weighted by TF-IDF and word2vec for both the datasets. Using the two datasets, we developed two kinds of models with each feature—binary models and ternary models. The models based on SVM with word2vec achieved better performance than the NB and RF models for both the binary and ternary categories. The result reveals that the ternary models achieved less confusion between hate and non-hate speech than the binary models.

Highlights

Social media is changing the face of communication and culture of societies around the world [1]
We selected 35 different public Facebook pages, which belonged to categories that contain a range of three to six selected pages based on the selection criteria of public pages
This paper proposes a solution for detecting hate speech on social media using machine learning techniques

Summary

Introduction

Social media is changing the face of communication and culture of societies around the world [1]. Multifarious populations in the country have been using online social media to communicate, express opinions, engage with friends, and share information [2,3,4]. The anonymity and mobility of online social media enable the netizens behind the screen to spread hateful content [5,6]. In order to control and prohibit hate speech, governments worldwide are framing stringent regulations and keeping the implementation of such policies under surveillance in their ambit [9]. The Indian government further monitors social media content to prevent the spread of harmful information, and restricts online hate speech by interrupting the internet service from time to time and blocking access to those sites [10,11]. The government has already introduced a law that expands the anti-terrorism law to encompass cyberspace in order to prohibit the dissemination of any terrorizing or obscene information

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Sep 15, 2021
Citations: 16	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter
Muhammad Okky Ibrohim ... Indra Budi
-
Muhammad Okky Ibrohim, et. al.Muhammad Okky Ibrohim ... Indra Budi
01 Jan 2019
01 Jan 2019

Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets
Oluwafemi Oriola ... Eduan Kotze
IEEE Access | VOL. 8
Oluwafemi Oriola, et. al.Oluwafemi Oriola ... Eduan Kotze
01 Jan 2020
IEEE Access | VOL. 8

Indonesian Tweets Hate Speech Target Classification using Machine Learning
Sandy Kurniawan ... Indra Budi
-
Sandy Kurniawan, et. al.Sandy Kurniawan ... Indra Budi
03 Nov 2020
03 Nov 2020

Development of an Efficient Method to Detect Mixed Social Media Data with Tamil-English Code Using Machine Learning Techniques
Shibly Fha ... Hmm Naleer
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Shibly Fha, et. al.Shibly Fha ... Hmm Naleer
21 Feb 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences