Abstract
Supervised learning is a predictive method used to make predictions or classifications. Supervised learning algorithms work by building a model using training data that includes both independent and dependent variables. Several methods for building classification include Logistic Regression, Naive Bayes, K-Nearest Neighbor (KNN), decision tree, etc. The lack of capacity of a classification algorithm to generalize certain data can be associated with the problem of overfitting or underfitting. K-fold cross-validation is a method that can help avoid overfitting or underfitting and produce a algorithm with good performance on new data. This study will test the Naive Bayes, K-Nearest Neighbor (KNN), Classification and Regression Tree (CART), and Logistic Regression methods with k-fold cross-validation on two different datasets. The values of k set for cross-validation are 2, 3, 5, 7, and 10. The analysis results concluded that each classification algorithm performed best at 10-fold cross-validation. In DATA 1, the Naive Bayes algorithm has the highest average accuracy of 0.67 (67%) and the error rate is 0.33 (33%), followed by the CART algorithm, KNN, and finally logistic regression. While DATA 2, the KNN algorithm has the highest average accuracy of 0.66 (66%) and an error rate of 0.34 (34%), followed by the CART algorithm, Naive Bayes, and finally logistic regressionbut can be a reference if you want to predict the growth direction of the accommodation and food service activities sector.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have