Abstract

Hate speech refers to verbal expression or communication that aims to provoke or discriminate against individuals. The Ministry of Communication and Information of Indonesia has encountered and dealt with 3,640 cases of hate speech transmitted through digital channels between 2018 and 2021. Particularly in South Kalimantan, hate speech in the local language, Banjarese has become increasingly prevalent in recent years. Surprisingly, there is a lack of research on using machine learning to detect hate speech in the Banjarese language, specifically on Instagram. Therefore, this study aimed to address this gap by constructing a dataset of Banjarese language hate speech and comparing various feature extraction and machine learning models to detect Banjarese language hate speech effectively. Thisresearch used several feature extraction techniques and machine learning methods to detect Banjareselanguage hate speech. The feature extraction methods used were Word N-Gram, Term Frequency- Inverse Document Frequency (TF-IDF), a combination of Word N-Gram and TF-IDF, Word2Vec, and Glove, while the machine learning methods used were Support Vector Machine (SVM), Na¨ıve Bayes, and Decision Tree. The results of this study revealed that the combination of TF-IDF for feature extraction and SVM as the model achieves exceptional performance. The average Recall, Precision, Accuracy, and F1-Score score exceeded 90%, demonstrating the model’s ability to identify Banjarese hate speech accurately.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call