Abstract

Hate speech identification in social media has emerged as a highly debated research topic in computational linguistics. Understanding linguistic phenomena in low-resource languages, in particular, remains a major problem in natural language processing. Code-mixing is a common phenomenon in social media writing, particularly in multilingual societies such as India. Traditional deep learning techniques trained on monolingual data will not perform well on code-mixed data, and training new models are challenging due to a lack of resources. Converting multilingual data into monolingual is an important solution to this challenge. TIF-DNN, a Transformer-based Interpretation and Feature Extraction Model is proposed in this work for hate speech identification. We used the IndicNLP and Englishtohindi libraries for transliteration and translation, respectively, and mBERT for feature extraction in our suggested model. Later, we compared our findings to various baseline and existing models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call