Hate speech detection from texual data related papers in hindi langauge
Answer from top 10 papers
The detection of hate speech in textual data, particularly in the Hindi language, has been addressed in several studies. Jafri et al. (2024) introduces HateCheckHIn, an evaluation dataset for multilingual hate speech models, with a focus on Hindi as the base language. Sharma et al. (2021) discusses the challenges of hate speech detection in Hindi and Marathi, employing deep learning architectures and comparing their performance on the HASOC 2021 datasets. Velankar et al. (2021) presents the CHUNAV dataset, which is specifically designed for hate speech categorization in the context of Indian elections, with tweets categorized into "Hate" and "Non-Hate" labels. Khan et al. (2021) proposes a data augmentation approach for generating synthetic hate speech data in Hindi, demonstrating that models trained on synthetic data can perform comparably or even better than those trained on limited real data. Rana and Jha (2022) introduces the TABHATE dataset for target-based hate speech in Hindi, exploring the use of deep learning and transformer-based models for detection. Gandhi et al. (2024) focuses on Hindi-English code-switched language, presenting the MoH pipeline for improving hate speech detection using language models like Multilingual Bert and MuRIL.
Interestingly, while Khullar et al. (2024) does not specifically address hate speech detection in Hindi, it contributes to the broader understanding of hate speech detection in low-resource South Asian languages by focusing on Roman Urdu. This highlights the regional linguistic challenges and the need for tailored solutions in hate speech detection.
In summary, the reviewed papers collectively underscore the importance of developing robust models and datasets for hate speech detection in Hindi. They reveal the complexity of the task due to linguistic diversity, code-switching, and the scarcity of annotated data. The studies propose innovative solutions, including the creation of specialized datasets (Jafri et al., 2024; Rana & Jha, 2022; Velankar et al., 2021), the use of deep learning techniques (Rana & Jha, 2022; Sharma et al., 2021), synthetic data generation (Khan et al., 2021), and transliteration pipelines (Gandhi et al., 2024), to enhance the performance of hate speech detection systems in the Hindi language (Gandhi et al., 2024; Jafri et al., 2024; Khan et al., 2021; Rana & Jha, 2022; Sharma et al., 2021; Velankar et al., 2021).
Source Papers