Indonesian Hate Speech Detection Using Bidirectional Long Short-Term Memory (Bi-LSTM)

Aditya Perwira Joan Dwitama,Dhomas Hatta Fudholi,Syarif Hidayat

doi:10.29207/resti.v7i2.4642

Aditya Perwira Joan Dwitama, Dhomas Hatta Fudholi + Show 1 more

Open Access

https://doi.org/10.29207/resti.v7i2.4642

Copy DOI

Abstract

Abstract  Social media is a platform that allows users to express themselves freely including spreading hate speech content. The government has issued the regulation in the UU ITE to handle and prevent hate speech on social media. The research was also conducted using the Bi-LSTM to classify the text into hate speech or not. Another research was purposed to detect hate speech and its categories using Bi-GRU. However, the performance of the model Bi-GRU is still lower than Bi-LSTM with an accuracy of 86.44% and 96.44%. Therefore, this study aims to build a model that can detect hate speech and its categories. The research offers Bi-LSTM as a classification model and IndoBERT as a tokenization model. The dataset used is a public dataset containing 13 thousand tweets. As a result, the best model obtained is using 20 epochs, 192 batch sizes, 1 layer Bi-LSTM with 40 nodes, and applying class weighing in the optimization process. The pre-train model from IndoBERT that is used to support the performance of the model in classifying is "indobenchmark/indobert-large-p2". The performance given by the purposed model is very good with an average accuracy, precision, and recall of 97.66%, 96.50%, and 85.25%.

Full Text