Abstract

Social Media has been growing and has provided the world with a platform to opine, debate, display, and discuss like never before. It has a major influence in research areas that analyze human behavior and social groups, and the phenomenon of social interactions is even being used in areas such as Internet of Things. This constant stream of data connecting individuals and organizations across the globe has had a tremendous impact on the functioning of society and even has the power to sway elections. Despite having numerous benefits, social media has certain issues such as the prevalence of fake news, which has also led to the rise of the hate speech phenomenon. Due to lax security throughout these social media platforms, these issues continue to exist without any repercussions. This leads to cyberbullying, defamation, and presents grave security concerns. Even though some work has been done independently on native scripts, hate speech detection, and code-mixed data, there exists a lack of academic work and research in the area of detecting hate speech in transliterated code-mixed data and in-text containing native language scripts. Research in this field is inhibited greatly due to the multiple variations in grammar and spelling and in general a lack of availability of annotated datasets, especially when it comes to native languages. This article comes up with a method to automate hate speech detection in code-mixed and native language text. The article presents an architecture containing a Tabnet classifier-based model trained on features extracted using MuRIL from transliterated code-mixed textual data. The article also shows that the same model works well on features extracted from text in Devanagari despite being trained on transliterated data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call