Supervised binary classification methods for strawberry ripeness discrimination from bioimpedance data

Pietro Ibba,Ørjan G Martinsen,Christian Tronstad,Luisa Petti,Stefano Cesco,Tanja Mimmo,Giuseppe Cantarella,Paolo Lugli,Roberto Moscetti

doi:10.1038/s41598-021-90471-5

Pietro Ibba, Ørjan G Martinsen + Show 7 more

Open Access

https://doi.org/10.1038/s41598-021-90471-5

Copy DOI

Abstract

Strawberry is one of the most popular fruits in the market. To meet the demanding consumer and market quality standards, there is a strong need for an on-site, accurate and reliable grading system during the whole harvesting process. In this work, a total of 923 strawberry fruit were measured directly on-plant at different ripening stages by means of bioimpedance data, collected at frequencies between 20 Hz and 300 kHz. The fruit batch was then splitted in 2 classes (i.e. ripe and unripe) based on surface color data. Starting from these data, six of the most commonly used supervised machine learning classification techniques, i.e. Logistic Regression (LR), Binary Decision Trees (DT), Naive Bayes Classifiers (NBC), K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and Multi-Layer Perceptron Networks (MLP), were employed, optimized, tested and compared in view of their performance in predicting the strawberry fruit ripening stage. Such models were trained to develop a complete feature selection and optimization pipeline, not yet available for bioimpedance data analysis of fruit. The classification results highlighted that, among all the tested methods, MLP networks had the best performances on the test set, with 0.72, 0.82 and 0.73 for the F_1, F_{0.5} and F_2-score, respectively, and improved the training results, showing good generalization capability, adapting well to new, previously unseen data. Consequently, the MLP models, trained with bioimpedance data, are a promising alternative for real-time estimation of strawberry ripeness directly on-field, which could be a potential application technique for evaluating the harvesting time management for farmers and producers.

Highlights

Strawberry is one of the most popular fruits in the market
The most widely used and effective discrimination techniques are represented by LR19, artificial neural networks (ANN)20, DT21, SVM22, NBC23 and KNN24
It is important to underline that in poorly performing models (LR, Decision Trees (DT) and Naive Bayes Classifiers (NBC), Fig. 6a–c) the selected features significance provides less information, as a poor fit with the training data might be associated with an arbitrary feature selection

Summary

Introduction

Strawberry is one of the most popular fruits in the market. To meet the demanding consumer and market quality standards, there is a strong need for an on-site, accurate and reliable grading system during the whole harvesting process. The fruit batch was splitted in 2 classes (i.e. ripe and unripe) based on surface color data Starting from these data, six of the most commonly used supervised machine learning classification techniques, i.e. Logistic Regression (LR), Binary Decision Trees (DT), Naive Bayes Classifiers (NBC), K-Nearest Neighbors (KNN), Support Vector Machine (SVM) and MultiLayer Perceptron Networks (MLP), were employed, optimized, tested and compared in view of their performance in predicting the strawberry fruit ripening stage. Both papers achieved a good accuracy in the classification task, obtaining a 90% (SVM) and 100% (ANN) precision, training the models with a total 100 and 180 samples, respectively Such works, despite being a good starting point for the use of a combined bioimpedance and machine learning approach for the evaluation of fruit quality, lack in the amount of considered data to develop the models and most importantly do not provide a detailed data analysis pipeline, which is strongly needed as a reference in the development of similar works. Such models are developed to be coupled with already developed portable instruments[30] and employed for the on-field strawberry fruit quality assessment to facilitate and optimize the harvesting process

Methods

Results

Conclusion