Abstract

In social media, people are free to express their feelings and thoughts. However, people can also use abusive language and hate speech to insult or humiliate individuals or groups on social media, such as Twitter. Various detection methods have been developed to control the spread of abusive language and hate speech in Indonesia, but the detection process is still focused on monolingual. As a country with various ethnicities and cultures, Indonesia also has a variety of local languages. This study examines abusive language and hate speech detection on Twitter, which also contains five local languages, including Javanese, Sundanese, Madurese, Minangkabau, and Musi. In this work, we present a preliminary evaluation to find the best performance of machine learning methods in detecting abusive language and hate speech on Twitter as preliminary study for each local language. We use several machine learning algorithms, such as Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT) as classifiers and TF-IDF weighted word n-gram and character-n gram as feature extraction. The experiments use the 5-Fold cross-validation approach and evaluated by measuring the F-1-Score. After the experiment, we have obtained the SVM classifier with word n-gram features show the best F-1-Score for each dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call