Abstract

Hate speech is a significant issue in content management on social media platforms. Effective classification of hate speech plays a crucial role in maintaining a safe social media environment, combating discrimination, and protecting users. This study evaluates a hate speech classification model using SVM with linear and polynomial kernels. The dataset used consists of labeled Indonesian-language tweets. The importance of developing an effective classification model to address hate speech has led to the utilization of DistilBERT as a feature extraction method. However, DistilBERT has high-dimensional features, necessitating dimensionality reduction to reduce model complexity. Therefore, in this study, the PCA dimensionality reduction method is implemented with various scenarios of dimensionality, namely 10, 20, 30, 40, and 50. Evaluation is performed using F1-Score, and the entire study is evaluated using 10-fold cross-validation. The evaluation results indicate that in the scenario with a linear kernel, the model achieves the highest F1-Score of 0.75 in the 50-dimensional scenario. Meanwhile, in the scenario with a polynomial kernel, the model achieves the highest F1-Score of 0.7857 in the 50-dimensional scenario. These findings demonstrate that the use of a polynomial kernel with 50 dimensions yields the best performance in classifying hate speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call