To achieve accurate, rapid, and non-destructive oil content determination in maize, a neural fitting model that combines feature selection and feature extraction is proposed. Competitive adaptive re-weighted sampling (CARS) is used to choose features, while the deep sparse autoencoder (DSAE) network is utilized to extract features. The 160 spectral data from the near-infrared spectra public datasets of maize served as experimental samples. Several preprocessing approaches were tested, and the first-order derivatives fared the best. To capture the various degrees of abstract features in spectral data, SSAE networks with two or three hidden layers are developed. The number of nodes in the network's hidden layer (30, 50, 70, 100, 200, 300, respectively) and the network's sparsity parameter (0.001, 0.004, 0.007, 0.01, 0.04, 0.07, 0.1, 0.2, …, 1) are tuned to generate partial least squares regression (PLSR) models for testing the feature learning effect. The highest performance comes from a three-layer network with 200 hidden layer nodes and a sparsity parameter of 0.04. This network extracts three levels of features from the preprocessed spectral data: F1, F2, and F3. Full_cars, F1_cars, F2_cars, and F3_cars are obtained using the CARS algorithm for variable selection on the preprocessed spectral data, F1, F2, and F3, respectively. They are then integrated into a final feature set to build a neural-fitting network (FNN). After training and testing, the test set has a correlation coefficient of 0.964205 and an mean square error (MSE) of 0.00636809. Comparing this model to support vector machine regression (SVR), PLSR, and gaussian process regression (GPR), the findings demonstrate that the model described in this study outperforms the others in terms of fitting accuracy and generalization performance. To summarize, this paper introduces a new regression analysis method based on near-infrared spectra.
Read full abstract