Effect of Random Splitting and Cross Validation for Indonesian Opinion Mining using Machine Learning Approach

Mariana Purba,Vina Ayumi,Abdiansah Abdiansah,Handrie Noprisson,Hadiguna Setiawan,Umniy Salamah,Yadi Yadi,Ermatita Ermatita

doi:10.14569/ijacsa.2022.0130917

Abstract

Opinion mining has been a prominent topic of research in Indonesia, however there are still many unanswered questions. The majority of past research has been on machine learning methods and models. A comparison of the effects of random splitting and cross-validation on processing performance is required. Text data is in Indonesian. The goal of this project is to use a machine learning model to conduct opinion mining on Indonesian text data using a random splitting and cross validation approach. This research consists of five stages: data collection, pre-processing, feature extraction, training & testing, and evaluation. Based on the experimental results, the TF-IDF feature is better than the Count-Vectorizer (CV) for Indonesian text. The best accuracy results are obtained by using TF-IDF as a feature and Support Vector Machine (SVM) as a classifier with cross validation implementation. The best accuracy reaches 81%. From the experimental results, it can also be seen that the implementation of cross validation can improve accuracy compared to the implementation of random splitting.

Full Text