Predictive modeling for wine authenticity using a machine learning approach

Nattane Luíza Da Costa,Leonardo A Valentin,Inar Alves Castro,Rommel Melgaço Barbosa

doi:10.1016/j.aiia.2021.07.001

Abstract

The purpose of this paper is to classify wines from 4 different countries in South America. Each class of wines is formed by samples considered by experts as representatives of the following commercial categories: “Argentinean Malbec (AM)”, “Brazilian Merlot (BM)”, “Uruguayan Tannat (UT)” and “Chilean Carménère (CC)”. The 83 samples collected were analyzed according to their composition of volatiles, semi-volatiles and phenolic compounds. We built a decision system for classification based on support vector machines (SVM), along with Correlation-based Feature selection (CFS), and Random Forest Importance (RFI), which measures the relative importance of the input variables. First, we use CFS to select a subset of variables among 190 chemical compounds. Thirteen chemicals were selected as correlated to the category and uncorrelated with each other. Afterwards, these chemical compounds were organized according to the importance ranking given by the RFI and classified with SVM. The study clearly indicated that SVM in combination with feature selection methods was able to identify the most important chemicals to classify the wine samples. Among the compounds identified in the wine samples, the variable subset defined by the feature selection methods, which were catechin, gallic, octanoic acid, myricetin, caffeic, isobutanol, resveratrol, kaempferol, and ORAC, were able to achieve an accuracy of 93.97% in classifying the commercial categories.

Full Text