Abstract

Hate speech in social media is becoming a relevant issue recently. Several studies have been proposed to deal with the hate speech phenomena in online communication. However, detecting hate speech messages from social media data is not a trivial task. Previous works have mentioned the problem of code-mixed languages in hate speech detection. As a matter of fact, Indonesia consists of several regions, each with its own local languages. Naturally, Indonesians tend to mix their own local language with Bahasa Indonesia when communicating in everyday conversation, including in social media communication, which contributes to the difficulty of processing Indonesian social media data. In this study, we plan to investigate hate speech detection in code-mixed Indonesian social media by exploiting several available multilingual language resources. Our experiment shows that the current available multilingual language model could not improve the model performance compared to the models which utilized the monolingual Indonesian language model. We also found that the most recent neural-based models are able to obtain better performance than the traditional model. For future work, we plan to implement a transfer learning approach to detect hate speech in Indonesian social media, specifically to deal with the code-mixed issue.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call