The effectiveness of PCA and various hyperparameter settings in SVM and KNN for wine quality estimation

Siyi He

doi:10.54254/2755-2721/31/20230128

Abstract

Wine is popular around the world and wine quality evaluation is focused by the wine companies. Wine quality prediction through machine learning is expected to mitigate the waste of time and money of artificial wine quality prediction. Previous researches focused on simple applications and comparisons of the machine learning methods on the wine dataset, but the exploration of optimal parameters of models lacked. Therefore, this research mainly aimed to determine wine quality based on known data by implementing various machine learning models and find the optimal model for predicting the wine quality. For the optimal model, the detailed value of parameter and setting are aimed to be explored. This paper trained five machine learning algorithms and tested them on a wine dataset. The impact of standardization on different machine learning models was tested. Except for decision tree and AdaBoost, standardization is an effective method to improve the performances of different methods. Support vector machine (SVM) with rbf kernel performed best among different SVM classifiers. K-nearest neighbor (KNN) of twenty-five neighborhood points combined with principal component analysis (PCA) of five principal components showed 90.94% accuracy and it is the optimal algorithm.

Full Text