Most of the research on intact fruit spectroscopy is derivative in nature as it primarily showcase application of existing spectroscopy devices which are often proprietary in nature. The regression models developed by researchers to predict physicochemical attributes using spectra remain theoretical due to lack of mechanism to integrate the developed models back into proprietary devices. This poses challenge for commercial adaptation of this technology in commercial food quality supply chain. The present study addresses this research gap by presenting first of its kind innovative approach to classify tomatoes based on lycopene content using chemometrics-machine learning framework driven portable short-wave near infra-red (SWNIR) spectrophotometer developed by integration of open-source hardware (AS7265x multispectral chipset having wavelength range 410–940 nanometre (nm), Arduino Uno microcontroller) and software (R platform), housed in ergonomically designed and 3-dimension printed cabinet ensuring noise-free spectra acquisition. The lycopene content was observed to have strong negative correlation with wavelengths (nm) 485, 560 and 585 at ρ = – 0.65, – 0.70, – 0.70, whereas strong positive correlation with 760 nm at ρ = +0.64. Similar associations were qualitatively observed using principal component analysis. Atypical of literature, feature selection was performed based on analysis of variance and 14 wavelengths which exhibited statistically significant difference with respect to 15-days storage study (p ≤ 0.05) were selected for model development. Chemometrics-machine learning framework was used for development of optimised probabilistic and non-probabilistic models including logistic regression, Linear Discriminant Analysis (LDA), Random Forest (RF), Artificial Neural Networks (ANN) and Support Vector Machine (SVM) models using 10-fold cross validation subjected to 80–20% train-test split of the dataset. In agreement with literature, 500–750 nm wavelength range dominated the classification of lycopene content. Notably, specific wavelengths for logistic regression (560 nm), LDA (730 nm, 645 nm, 560 nm, 535 nm), RF (760 nm, 585 nm, 560 nm, 645 nm), and ANN (585 nm, 560 nm) significantly influenced outcome instances across classifiers. Accuracy obtained from confusion matrix on test dataset was used as performance metric to compare different models. Logistic regression and RF showcased accuracy of 80%, LDA and SVM at 90% while ANN outperformed all models with accuracy of 95%. This study successfully augmented technological advancement in field of spectroscopy for non-invasive quality assessment of fruit. It is recommended to conduct similar studies on other climacteric fruits for wider adoption of this technology.
Read full abstract