Indonesian Hate Speech Detection Using IndoBERTweet and BiLSTM on Twitter

Juanietto Forry Kusuma,Andry Chowanda

doi:10.30630/joiv.7.3.1035

Juanietto Forry Kusuma, Andry Chowanda

Open Access

https://doi.org/10.30630/joiv.7.3.1035

Copy DOI

Abstract

Hate speech is an act of speech to spread hate to other people. In this digital era where everyone connects with social media, hate speech is growing rapidly and uncontrollably. Many people do not realize they are giving hate speech when critics something on social media due to a lack of awareness of the difference between hate speech and free speech. The results make victims feel alienated from society, and the people who spread it would often face the law. Detection in the sentences to identify whether it contains hate speech is essential to counter people's ignorance. For detecting such sentences, a machine learning algorithm is widely used to help identify each sentence. In this paper, we used a subset from machine learning named deep learning with the latest IndoBERT model named IndoBERTweet and combined it with RNN layer named BiLSTM. The appearance of IndoBERTweet opened more chances to further improve text classification performance with the addition of BiLSTM layer. The model first made a token representative from the sentence, then calculated it to analyze and made the classification based on the calculation. For this model to be effective, we trained our model with the labeled public dataset retrieved from Twitter. These datasets are classified into hate speech and non-hate speech, and these labels are applied to the models. We evaluated our model and achieved an accuracy of 93.7%, an improvement for classifying hate speech sentences from previous research.

Full Text