Abstract

Hate speech is a significant issue in content management on social media platforms. Effective classification of hate speech plays a crucial role in maintaining a safe social media environment, combating discrimination, and protecting users. This study evaluates a hate speech classification model using SVM with linear and polynomial kernels. The dataset used consists of labeled Indonesian-language tweets. The importance of developing an effective classification model to address hate speech has led to the utilization of DistilBERT as a feature extraction method. However, DistilBERT has high-dimensional features, necessitating dimensionality reduction to reduce model complexity. Therefore, in this study, the PCA dimensionality reduction method is implemented with various scenarios of dimensionality, namely 10, 20, 30, 40, and 50. Evaluation is performed using F1-Score, and the entire study is evaluated using 10-fold cross-validation. The evaluation results indicate that in the scenario with a linear kernel, the model achieves the highest F1-Score of 0.75 in the 50-dimensional scenario. Meanwhile, in the scenario with a polynomial kernel, the model achieves the highest F1-Score of 0.7857 in the 50-dimensional scenario. These findings demonstrate that the use of a polynomial kernel with 50 dimensions yields the best performance in classifying hate speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.