Abstract
This study evaluates the performance of four machine learning models: Support Vector Machine (SVM), Random Forest, K-Nearest Neighbors (KNN), and Naive Bayes in analyzing visitor reviews of the Lokawisata Baturaden tourist attraction. Using 5-fold cross-validation, the study aims to determine which machine learning model best suits sentiment analysis on the Baturaden review data. This study was conducted through several stages, including data preprocessing, feature extraction, and the data training process. Case folding, text cleaning, tokenization, stopword removal, and stemming were performed during the data preprocessing stage. The feature extraction method used was TF-IDF. SMOTE was applied to increase data variation and address the data imbalance in the dataset. The results show that SVM provides the best performance with an accuracy of 0.937, an F1-score of 0.937, a precision of 0.943, and a recall of 0.937. Random Forest also performs well with an accuracy of 0.918 and an F1-score of 0.918, though slightly below SVM. KNN shows the lowest performance with an accuracy of 0.651 and an F1-score of 0.544, while Naive Bayes performs adequately with an accuracy of 0.845 and an F1-score of 0.841. Based on this evaluation, SVM is recommended as the best model for sentiment analysis of reviews, followed by Random Forest as a good alternative. The KNN model is not recommended due to its lower performance, while Naive Bayes can be considered for its speed and simplicity, although its results are not as good as SVM and Random Forest. These conclusions guide the selection of the optimal model to enhance understanding and visitor experience at the Baturaden tourist attraction.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have