In this paper, samples of Cabernet Sauvignon wines produced in California have been analyzed on the basis of their elemental content and classified according to its geographical origin by the use of machine learning. Overall, 13 metals (Al, Cd, Co, Cr, Cu, Li, Mn, Ni, P, Pb, Rb, Sr, and Zn) were determined by inductively coupled plasma mass spectrometry (ICP-MS). We used two algorithms of variable selection in order to estimate the relevance of each metal to classification. Predictive models based on chemometric tools and machine learning algorithms were developed to differentiate origin of wine samples. Li and Sr were identified as the main responsible for the differentiation of samples. The application of Random Forest permitted to correctly classify all samples. A second analysis was performed by removing the variables Li and Sr to investigate the relevance of the others metals. We found that a group of seven variables (Cd, Ni, Mn, Pb, Rb, Co, Cu) which were able to discriminate the wines in 89% of accuracy by using Support Vector Machines. Results suggested that the developed methodology by advanced machine learning techniques is robust and reliable for the geographical classification of wine samples, and the study of the elements that characterize the regions.
Read full abstract