Regression Modeling Approaches for Red Wine Quality Prediction: Individual and Ensemble

Amrutha K

doi:10.22214/ijraset.2023.54363

Abstract

Abstract: This paper aims to compare the performance of several regression models and a combination of regression and ensemble models in predicting the quality of red wine using the wine quality dataset from the UCI Machine Learning Repository. The dataset consists of white and red vinho verde wines from northern Portugal, with 6,497 samples. Before training the models, the dataset undergoes appropriate preprocessing steps to ensure data quality and consistency. Five re-gression algorithms, namely Linear Regression (LR), Random Forest Regressor (RF), Support Vector Regression (SVR), Decision Tree Regressor (DT), and Multi-layer Perceptron Regressor (MLP) are trained and tested on the dataset. Additionally, the predictions of these individual regression models are combined with four ensemble models: XGBRegressor (XGB), AdaBoostRegressor (ABR), BaggingRegressor (BR), and GradientBoostingRegressor (GRB). The results indicate that among the individual models, Random Forest (RF) performs the best, exhibiting the lowest MAE, MSE, and RMSE values and the highest R2 score. This suggests that RF better fits the red wine quality dataset compared to the other regression models. However, the combination of Random For-est with Bagging Regressor (RF and BR) outperforms the individual models, demonstrating lower errors and a relatively higher R2 score.

Full Text