Abstract

Detection of hostile content from social media posts (Facebook, Twitter, etc.) is a demanding task in the field of Natural Language Processing. The increase of hostile content in different electronic media has opened up new challenges in language understanding. It becomes more difficult in regional languages. AI-based solutions are required to identify hostile content on a large scale. Although a satisfactory amount of research has been carried out in the English language, finding hostile content in regional languages is still under development due to the unavailability of suitable datasets and tools. In terms of the number of speakers, Hindi ranks third in the world and first on the Indian subcontinent. The objective of this article is to design a hostile content detection system in Hindi using coarse-grained (binary) classification and fine-grained (multi-class, multi-label) classification. We note that different baseline learning methods with different pre-trained language models perform differently. Using the Constraint 2021 Hindi Dataset, this research proposes a Bidirectional Encoder Representations from Transformers–(BERT) based contextual embedding technique with a concatenation of emoji2vec embeddings to classify social media posts in Hindi Devanagari script as hostile or non-hostile. Additionally, for the fine-grained tasks where hostile posts are sub-categorized as defamation, fake, hate, and offensive, we develop an ensemble classifier varying different learning methods and embedding models. With an F1-Score of 0.9721, it is found that our proposed Indic-BERT+emoji model outperforms the baseline model and other existing models for the coarse-grained task. We have also observed that our proposed ensemble method provides better results than the existing models and the baseline model for the fine-grained tasks with F1-Scores of 0.43, 0.82, 0.58, and 0.62 for the defamation, fake, hate, and offensive classes, respectively. The code and the data are available at https://github.com/skarifahmed/hostile .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call