Abstract

Social media is an effective tool for connecting with people and distributing information. However, many people often use social media to spread hate speech and abusive languages. In contrast to hate speech, abusive languages are frequently used as jokes with no purpose of offending individuals or groups, even though they may contain profanities. As a result, the distinction between hate speech and abusive language is often blurred. In many cases, individuals who spread hate speech may be prosecuted as it has legal implications. Previous research has focused on binary classification of hate speech and normal tweets. This study aims to classify hate speech, abusive language, and normal messages on Indonesian Twitter. Several machine learning models, such as logistic regression and BERT models, are utilized to accomplish text classification tasks. The model's performance is assessed using the F1-Score evaluation metric. The results show that BERT models outperform other models in terms of F1-Score, with the BERT-indobenchmark model, which was pretrained on social media text data, achieving the highest F1-Score of 85.59. This also demonstrates that pretraining the BERT model using social media data improves the classification model significantly. Developing such classification model that can distinguish between hate speech and abusive language would help individuals in preventing the spread of hate speech that has legal implications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call