Abstract

Purpose: Freedom in social media gives rise to the possibility of disturbing users through the sentences they send, which is limited by the Electronic Information and Transactions Law (UU ITE). This research aims to find an effective method for classifying hate speech text data, especially in Indonesian, with many categories expected to minimize this case.Methods: This study used 1.000 data from Twitter with five labels, including religion, race, physical, gender and other (invective or slander). The process started with several steps of preprocessing, data transformation using TF-IDF-ICSρF term weighting and data mining using an Improved KNN algorithm. Then, the results were compared with the TF-IDF and KNN methods to evaluate the differences.Result: Using TF-IDF-ICSρF and Improved KNN algorithms gets an average accuracy value of 88.11%, 17.81% higher compared with the same data and parameters to the K-Nearest Neighbor and TF-IDF algorithms, which get results of 70.30%.Novelty: Based on the comparison results, TF-IDF-ICSρF and Improved KNN methods can effectively classify hate speech sentences that have many labels with fairly good accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call