Machine Learning on Wine Quality: Prediction and Feature Importance Analysis

Quanyue Xie

doi:10.54097/hset.v41i.6803

Abstract

Recently, wine has become a common drink in most people's homes, but most people have different opinions on the evaluation of wine quality. Artificial intelligence can provide a relatively fair assessment and help practitioners focus on certain features to improve wine quality. This study uses decision trees and random forests to learn and predict on wine datasets and investigate feature importance to derive the features that have the greatest impact on wine quality. First of all, this study deals with the original data reasonably, and uses the IQR method to remove some outliers, specifically the data of the first 0.09 and the last 0.09. Second, since the correlation between the two features of density and residual sugar is as high as 0.84, this study removes density to improve the final prediction accuracy. When using both the decision tree and random forest models, the parameters are debugged multiple times in this study, and the three results are retained in this paper. Finally, on the basis of random forest, this study analyses feature importance, and draws a bar graph and the ranking order of different feature importance. In the final result, the prediction accuracy of random forest is relatively higher than that of decision tree, because the random forest model optimizes the decision tree to some extent. In the study on feature importance, alcohol has the greatest impact on the quality of white wine, while the smallest feature is citric acid. This study adjusts the original data set and compares the accuracy of different models, focusing on the importance of features based on the random forest model.

Full Text