Machine learning modeling and prediction of peanut protein content based on spectral images and stoichiometry

Man Zhou,Li Wang,Hejun Wu,Qingye Li,Meiliang Li,Zhiqing Zhang,Yongpeng Zhao,Zhiwei Lu,Zhiyong Zou

doi:10.1016/j.lwt.2022.114015

Abstract

For rapid nondestructive detection of peanut protein content, an experimental method combining hyperspectral imaging technology and spectrophotometry was proposed. For data redundancy and noise analysis, ten algorithms were selected for feature extraction, and revealed that the optimal characteristic band of protein content was between 400 and 550 nm. According to the results, the median filtering algorithm (MF) was used to preprocess original spectral data, the XGBoost algorithm was used to extract the top 30 feature bands, the Ridge algorithm was used to construct the protein content prediction model, and the protein content physicochemical data were measured by spectrophotometry. The optimal model was MF-XGBoost-Ridge, with hyperparameter α tuning by Optuna algorithm, with RMSE = 0.009, and a correlation R = 0.886 with a fitting time of only 0.02 s. Compared with the traditional machine learning algorithm models, the prediction accuracy of this study was high and the fitting time was short.

Full Text