Abstract
This study investigated the geographical origin classification of green coffee beans from continental to country and regional levels. An innovative approach combined stable isotope and trace element analyses with non-linear machine learning data analysis to improve coffee origin classification and marker selection. Specialty green coffee beans sourced from three continents, eight countries, and 22 regions were analyzed by measuring five isotope ratios (δ13 C, δ15 N, δ18 O, δ2 H, and δ34 S) and 41 trace elements. Partial least squares discriminant analysis (PLS-DA) was applied to the integrated dataset for origin classification. Origins were predicted well at the country level and showed promise at the regional level, with discriminating marker selection at all levels. However, PLS-DA predicted origin poorly at the continental and Central American regional levels. Non-linear machine learning techniques improved predictions and enabled the identification of a higher number of origin markers, and those that were identified were more relevant. The best predictive accuracy was found using ensemble decision trees, random forest and extreme gradient boost, with accuracies of up to 0.94 and 0.89 for continental and Central American regional models, respectively. The potential for advanced machine learning models to improve origin classification and the identification of relevant origin markers was demonstrated. The decision-tree-based models were superior with their embedded variable identification features and visual interpretation. © 2023 The Authors. Journal of The Science of Food and Agriculture published by John Wiley & Sons Ltd on behalf of Society of Chemical Industry.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have