Data preprocessing approach for machine learning-based sentiment classification

Jatmika Jatmika,Haeni Budiati,Sunneng Sandino Berutu,Fornieli Gulo

doi:10.20895/infotel.v15i4.1030

Abstract

Public sentiment regarding a particular issue, product, activity, or organization can be measured and monitored with an application based on artificial intelligence. The data come from comments circulating on social media. However, the rules for writing comments on social media have yet to be standardized, so non-standard words often appear in these comments. Non-standard words affect the determination of sentiment into positive, negative, and neutral categories. Therefore, this study proposes a data preprocessing approach by inserting the Rabin-Karp algorithm to improve non-standard words. This research consists of several stages, namely crawling data, data preprocessing, feature extraction, model development (based on Naïve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) methods), and analysis of the results. The experimental results showed that the proposed approach influences the determination of the sentiment category composition. Then, model testing results showed that all models obtain the highest value in the Positive category for the precision parameter with a value 1. All models in the Neutral category obtain the highest value for the recall parameter, almost reaching 1. All models in the Neutral category achieve the highest value of the f1-score parameter, with an average value of 0.95. In general, the results of the performance analysis of the classification model showed that the NB and SVM-based models have better performance than the DT method.

Full Text