Comparative Study of Several Machine Learning Algorithms for Classification of Unifloral Honeys.

Fernando Mateo,Andrea Tarazona,Eva María Mateo

doi:10.3390/foods10071543

Fernando Mateo, Andrea Tarazona + Show 1 more

Open Access

https://doi.org/10.3390/foods10071543

Copy DOI

Journal: Foods (Basel, Switzerland)	Publication Date: Jul 3, 2021
Citations: 14	License type: CC BY 4.0

Affiliation: University of Valencia

Abstract

Unifloral honeys are highly demanded by honey consumers, especially in Europe. To ensure that a honey belongs to a very appreciated botanical class, the classical methodology is palynological analysis to identify and count pollen grains. Highly trained personnel are needed to perform this task, which complicates the characterization of honey botanical origins. Organoleptic assessment of honey by expert personnel helps to confirm such classification. In this study, the ability of different machine learning (ML) algorithms to correctly classify seven types of Spanish honeys of single botanical origins (rosemary, citrus, lavender, sunflower, eucalyptus, heather and forest honeydew) was investigated comparatively. The botanical origin of the samples was ascertained by pollen analysis complemented with organoleptic assessment. Physicochemical parameters such as electrical conductivity, pH, water content, carbohydrates and color of unifloral honeys were used to build the dataset. The following ML algorithms were tested: penalized discriminant analysis (PDA), shrinkage discriminant analysis (SDA), high-dimensional discriminant analysis (HDDA), nearest shrunken centroids (PAM), partial least squares (PLS), C5.0 tree, extremely randomized trees (ET), weighted k-nearest neighbors (KKNN), artificial neural networks (ANN), random forest (RF), support vector machine (SVM) with linear and radial kernels and extreme gradient boosting trees (XGBoost). The ML models were optimized by repeated 10-fold cross-validation primarily on the basis of log loss or accuracy metrics, and their performance was compared on a test set in order to select the best predicting model. Built models using PDA produced the best results in terms of overall accuracy on the test set. ANN, ET, RF and XGBoost models also provided good results, while SVM proved to be the worst.

Highlights

Honey is a natural food appreciated worldwide with high nutritional value that provides many health benefits [1,2]
A comparison of machine learning (ML) algorithms on a dataset of one hundred honeys harvested in Spain and belonging to seven unifloral classes was performed using physicochemical parameters
The ML algorithms were built by splitting the dataset into a training set (70%) and a test set (30%) and optimizing the configuration by 10-fold cross-validation using several parameters, but mainly log loss

Summary

Introduction

Honey is a natural food appreciated worldwide with high nutritional value that provides many health benefits [1,2]. When the nectar is taken predominantly from a single type of flower, the honey produced has characteristic organoleptic properties, adding to its commercial value. Many consumers appreciate these particular sensorial properties very much, which increase these honeys’ price with respect to other types of honey. Due to the huge variety of different floral sources normally attainable by bees for foraging and to the great diversity within plant species, which is influenced by the climatic and growing conditions, the parameters used for characterizing unifloral honeys do not exhibit typical values but are defined in rather large, often overlapping ranges [7]. The differences observed in honey composition depend on a variety of factors, such as the region, season, nectar source, beekeeping practices and harvest period [8]

Objectives

Methods

Results

Discussion

Conclusion