Challenges and Approaches of Code-mixed Hate Speech Detection

Swayam Samparna Dash,Nikunja Bihari Kar

doi:10.1109/mlcss57186.2022.00060

Abstract

The online platform and social media are very eye catchy for internet users. Platforms like YouTube, Twitter, Instagram, etc., are higher in demand due to their brilliant services. Users of these sights frequently comment on others' posts which may contain toxic speech. Some platforms also raise concerns about emerging of this activity. As the increase of hate speech is just next to impossible to control, the need to detect these contents through automated hate speech detection technologies arises. In this work, we focused on multi-lingual issues, especially Indo-European code-mixed languages. At first, we identified some issues related to code-mixed Indian languages. Then, we focused on the available solutions to this problem. We have gone through the works performed on machine learning and deep learning techniques and identified the limitations of those works. We have analyzed the present solutions and gone through the comparative studies of those. Our implementation is conducted on code-mixed twitter datasets providing several perceptions on hate speech. We have performed the experimental work on HASOC 2021 dataset. Our work contributes to the field of hate speech detection by comparing feature extraction and classifier algorithms (Machine Learning and Deep Learning). More specifically, the proposed work aimed at distinguishing Hate and Non-Hate speech from normal text.

Full Text