Hate or Non-hate: Translation based hate speech identification in Code-Mixed Hinglish data set

Shankar Biradar,Arun Chauhan,Sunil Saumya

doi:10.1109/bigdata52589.2021.9671526

Abstract

Hate speech identification in social media has emerged as a highly debated research topic in computational linguistics. Understanding linguistic phenomena in low-resource languages, in particular, remains a major problem in natural language processing. Code-mixing is a common phenomenon in social media writing, particularly in multilingual societies such as India. Traditional deep learning techniques trained on monolingual data will not perform well on code-mixed data, and training new models are challenging due to a lack of resources. Converting multilingual data into monolingual is an important solution to this challenge. TIF-DNN, a Transformer-based Interpretation and Feature Extraction Model is proposed in this work for hate speech identification. We used the IndicNLP and Englishtohindi libraries for transliteration and translation, respectively, and mBERT for feature extraction in our suggested model. Later, we compared our findings to various baseline and existing models.

Full Text