Abstract
Hate speech in social media is becoming a relevant issue recently. Several studies have been proposed to deal with the hate speech phenomena in online communication. However, detecting hate speech messages from social media data is not a trivial task. Previous works have mentioned the problem of code-mixed languages in hate speech detection. As a matter of fact, Indonesia consists of several regions, each with its own local languages. Naturally, Indonesians tend to mix their own local language with Bahasa Indonesia when communicating in everyday conversation, including in social media communication, which contributes to the difficulty of processing Indonesian social media data. In this study, we plan to investigate hate speech detection in code-mixed Indonesian social media by exploiting several available multilingual language resources. Our experiment shows that the current available multilingual language model could not improve the model performance compared to the models which utilized the monolingual Indonesian language model. We also found that the most recent neural-based models are able to obtain better performance than the traditional model. For future work, we plan to implement a transfer learning approach to detect hate speech in Indonesian social media, specifically to deal with the code-mixed issue.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.