Abstract
In social media, people are free to express their feelings and thoughts. However, people can also use abusive language and hate speech to insult or humiliate individuals or groups on social media, such as Twitter. Various detection methods have been developed to control the spread of abusive language and hate speech in Indonesia, but the detection process is still focused on monolingual. As a country with various ethnicities and cultures, Indonesia also has a variety of local languages. This study examines abusive language and hate speech detection on Twitter, which also contains five local languages, including Javanese, Sundanese, Madurese, Minangkabau, and Musi. In this work, we present a preliminary evaluation to find the best performance of machine learning methods in detecting abusive language and hate speech on Twitter as preliminary study for each local language. We use several machine learning algorithms, such as Naïve Bayes (NB), Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT) as classifiers and TF-IDF weighted word n-gram and character-n gram as feature extraction. The experiments use the 5-Fold cross-validation approach and evaluated by measuring the F-1-Score. After the experiment, we have obtained the SVM classifier with word n-gram features show the best F-1-Score for each dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.