Abstract

Abstract With the generation of high-dimensional data from spectroscopic instruments, the role of variable selection in spectral modeling has become very important. This research proposes a new variable selection algorithm, named adaptive variable re-weighting and shrinking approach (AVRSA), based on model population analysis (MPA) and weighted bootstrap sampling (WBS). In this algorithm, WBS is used to generate sub-datasets for modeling in each iteration round, and the variable weight and space are updated by statistically evaluating the optimal sub-models. Unlike most other variable selection methods, the average prediction performance of the optimal sub-models by AVRSA must be preferable to that of the previous iteration. The best informative variables are obtained until no further optimal sub-models are generated. This method is checked on three near infrared (NIR) datasets. Three variable selection methods, including competitive adaptive reweighted sampling (CARS), MonteCarlo uninformative variable elimination (MC-UVE) and iteratively variable subset optimization (IVSO), are used for comparison. Compared with these variable selection algorithms, AVRSA selects the least informative variables, which is convenient for the development of portable instruments. Compared with the full-spectrum PLS model, the root mean square error of the validation set (RMSEP) of corn starch is decreased from 0.2614 to 0.1093, and the RMSEP of corn protein is decreased from 0.0977 to 0.0374. In addition, the RMSEP of the wheat dataset is decreased from 0.2585 to 0.2157, and the RMSEP of the wheat kernel dataset is decreased from 0.7816 to 0.6661. The results show that the proposed method is very efficient for the high-dimensional spectrum to find the best variables and improve the model's prediction performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call