Prediction of Wine Quality: Comparing Machine Learning Models in R Programming

Olatunde David Akanbi,Taiwo Mercy Faloni,Sunday Olaniyi

doi:10.51583/ijltemas.2022.11901

Abstract

The consideration of wine quality before consumption or use is not a new decision scheme across ages, fields, and people. Gone were the days when quality of wine solely depended on taste or other physical checks. In this age of data science and machine learning, we can make decisions on the best wine quality with reference to different features/variables. This work was done with in predicting the dependent variable while using existing models to analyze the independent variables. This work utilizes the R programming language for this prediction, while comparing different machine learning models like Linear regression, Neural network, Naive Bayes Classification, Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF). The provided data was divided into the testing and training portions with parts for validation. It was achieved that Random Forest has a better model for this prediction when cross cross-validated in 10-folds. The accuracy was then used to select the optimal model. Hence, alcohol is the feature variable that contributes more to wine quality while volatile acidity and chloride contribute the least to the quality of wine. This would assist breweries in determining the right additions and subtraction when wine quality is in question

Full Text