THAR- Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection

Deepawali Sharma,Aakash Singh,Vivek Kumar Singh

doi:10.1145/3653017

Abstract

During the last decade, social media has gained significant popularity as a medium for individuals to express their views on various topics. However, some individuals also exploit the social media platforms to spread hatred through their comments and posts, some of which target individuals, communities or religions. Given the deep emotional connections people have to their religious beliefs, this form of hate speech can be divisive and harmful, and may result in issues of mental health as social disorder. Therefore, there is a need of algorithmic approaches for the automatic detection of instances of hate speech. Most of the existing studies in this area focus on social media content in English, and as a result several low-resource languages lack computational resources for the task. This study attempts to address this research gap by providing a high-quality annotated dataset designed specifically for identifying hate speech against religions in the Hindi-English code-mixed language. This dataset “Targeted Hate Speech Against Religion” (THAR)) consists of 11,549 comments and has been annotated by five independent annotators. It comprises two subtasks: (i) Subtask-1 (Binary classification), (ii) Subtask-2 (multi-class classification). To ensure the quality of annotation, the Fleiss Kappa measure has been employed. The suitability of the dataset is then further explored by applying different standard deep learning, and transformer-based models. The transformer-based model, namely Multilingual Representations for Indian Languages (MuRIL), is found to outperform the other implemented models in both subtasks, achieving macro average and weighted average F1 scores of 0.78 and 0.78 for Subtask-1, and 0.65 and 0.72 for Subtask-2, respectively. The experimental results obtained not only confirm the suitability of the dataset but also advance the research towards automatic detection of hate speech, particularly in the low-resource Hindi-English code-mixed language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 18, 2024
Citations: 3	License type: mit

R Discovery Prime

R Discovery Prime

THAR- Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Similar Papers

UJARAN KEBENCIAN (KHITĀB AL-KARĀHIYAH) DALAM RUANG KONTESTASI SOSIAL POLITIK ARAB KONTEMPORER
Yoyo Yoyo
Adabiyyāt: Jurnal Bahasa dan Sastra | VOL. 3
Yoyo YoyoYoyo Yoyo
18 Jun 2019
Adabiyyāt: Jurnal Bahasa dan Sastra | VOL. 3

Form of Hate Speech Comments on Najwa Shihab Youtube Channels in The General Election Campaign of President and Vice President of The Republic of Indonesia 2019
Rahayu Pristiwati ... Tsalisa Yuliyanti
Seloka: Jurnal Pendidikan Bahasa dan Sastra Indonesia | VOL. 9
Rahayu Pristiwati, et. al.Rahayu Pristiwati ... Tsalisa Yuliyanti
31 Dec 2020
Seloka: Jurnal Pendidikan Bahasa dan Sastra Indonesia | VOL. 9

UJARAN KEBENCIAN NETIZEN INDONESIA PADA AKUN TWITTER ES TEH: TINJAUAN LINGUISTIK FORENSIK
Syafruddin ... Refisa Ananda
Semantik | VOL. 13
Syafruddin, et. al. Syafruddin ... Refisa Ananda
20 Feb 2024
Semantik | VOL. 13

Hate Speech: A Pragmatic Assessment of the European Court of Human Rights’ Jurisprudence
Alessio Sardo
European Convention on Human Rights Law Review | VOL. 4
Alessio SardoAlessio Sardo
29 Nov 2022
European Convention on Human Rights Law Review | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

THAR- Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic Detection

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing