An Approach of Hate Speech Identification on Twitter Corpus

Kavita Kumari,Anupam Jamatia

doi:10.1007/978-981-19-7513-4_11

Abstract

In recent times, the Internet and social media are very well-known and popular among people. Usage of social media is increased exponentially during the last few years globally, and it allows people to engage with one another and share ideas, thoughts, opinions, etc. Every day, massive amounts of data are disseminated at breakneck speed via social media platforms, reaching a massive audience. Furthermore, the ability to write anonymous posts and comments makes expressing and spreading hate speech even easier. To improve the users’ experience, social media sites are attempting to remove hateful remarks. In this research work, the main focus is on developing automated hate speech and offensive language detection models. Started with traditional hate speech and offensive language identification approaches, than reaches to advanced hate speech recognition methods for social media. This brings a need for integrated datasets and the hate speech prediction method. This paper describes the study on hate speech and offensive content identification in English language by using the various approaches based on machine learning algorithms (Support vector machine, decision tree, and so on) and NLP, along with the features used for the classification problem. The models were tested on the HASOC (2021) datasets and concluded that the ensemble model perform better than other algorithms with the test dataset for different tasks. Results and analysis part of this paper offers researchers a comprehensive picture of approaches.

Full Text