Abstract

One major issue plaguing online social media is hate speech, a complex phenomenon whose identification and target categorization have been studied by the natural language processing community. In recent years, notable studies have been made towards hate speech detection using various mechanisms varying from traditional machine learning to complex deep neural network models. However, these studies mainly focus on high-resource English language. The multilingual societies such as the Indian subcontinent: English, Hindi and Hindi-English code-mixed languages are widespread and convenient for the users. The research works studying hate speech detection in these languages are still very limited. To fill this gap, we propose an mBERT-GRU framework comprising of multilingual BERT embedding and bidirectional GRU layers to learn the cumulative features for hate speech detection and its target categorization. We evaluated our work on three datasets HASOC-2019, HS and HEOT to prove the competitive performance. Our results show that the proposed framework outperformed monolingual and state-of-the-art methods on English, Hindi and Hindi-English code-mixed datasets with Macro-F1 measure values of 0.87, 0.83 and 0.77, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call