Abstract

Consumers tends to purchase white wines based only on their taste and price due to the difficulty of studying the composition of white wine. Popular classifications of white wines are usually depending on some easily understandable aspects such as carbon dioxide pressure and grape harvest time. A detailed way to classify the quality of the white wines is needed for both the consumers and the market regulators. In this research, a white wine dataset with 11 parameters and a final quality value is being used to train the machine learning models for future prediction. To avoid extreme values influencing the dataset, this paper used the Interquartile Range method to remove the outliers. After processing the data, six machine learning models were applied to the dataset to test the initial accuracies of the models. The Random Forest had the best accuracy among all the models. Then the focus of the research turned into the feature importance of the Decision Tree and Random Forest methods. The project found out that it is possible to remove one of the parameters from two parameters that have similar importance while maintaining almost the same model accuracy. Both models’ parameter number were reduced to nine instead of 11 at the cost of less than 3% of accuracy. This provides people a useful way to make their analyzing processes easier in machine learning research.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.