Abstract
High dimensionality problem in spectra datasets is a significant challenge to researchers and requires the design of effective methods that can extract the optimal variable subset that can improve the accuracy of predictions or classifications. In this study, a hybrid variable selection method, based on the incremental number of variables using bootstrapping soft shrinkage method (BOSS) and interval random variable selection (IRVS) method is proposed and named BOSS-IRVS. The BOSS method is used to determine the informative intervals, while the IRVS method is used to search for informative variables in the informative interval determined by BOSS method. The proposed BOSS-IRVS method was tested using seven different public accessible near-infrared (NIR) spectroscopic datasets of corn, diesel fuel, soy, wheat protein, and hemoglobin types. The performance of the proposed method was compared with that of two outstanding variable selection methods i.e. BOSS and hybrid variable selection strategy based on continuous shrinkage of variable space (VCPA-IRIV). The experimental results showed clearly that the proposed method BOSS-IRVS outperforms VCPA-IRIV and BOSS methods in all tested datasets and improved the percentage of the prediction accuracy, by 15.4 and 15.3 for corn moisture,13.4 and 49.8 for corn oil, 41.5 and 50.6 for corn protein, 12.6 and 5.6 for soy moisture, 0.6 and 6.3 for total diesel fuel, 19.9 and 14.3 for wheat protein, and 5.8 and 20.3 for hemoglobin. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Highlights
In recent years, near-infrared (NIR) spectroscopy has gained wide acceptance in different fields such as agriculture and the petrochemical and pharmaceutical industries by virtue of its advantages in recording spectra for solid and liquid samples.The associate editor coordinating the review of this manuscript and approving it for publication was Barbara Masini .NIR spectra typically consist of broad, weak, non-specific, and overlapped bands and some irrelevant variables [1]
Deng et al proposed a new and effective single variable selection method named bootstrapping soft shrinkage method (BOSS) [4]. This method showed a significant improvement of prediction accuracy on three NIR spectroscopic datasets and outperforms partial least square (PLS), Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm coupled with partial least square (GA-PLS)
The parameter setting for VCPA-informative variables (IRIV) are as follows: α = 20 which is the mean number of each BMS sampling, EDF_run = 50 which is the number of exponentially decreasing function (EDF) run, BMS_run = 1000 which is the number of BMS run, σ = 0.1 which is the ratio of the best minus worst models of K sub-models, L = 100 which is the number of the left variables in the final run of exponential decline function (EDF), A_max = 10 which is the maximal principle component to extract for PLS, fold = 5 which is the group number of cross-validation, and method = center which is the pretreatment method
Summary
Near-infrared (NIR) spectroscopy has gained wide acceptance in different fields such as agriculture and the petrochemical and pharmaceutical industries by virtue of its advantages in recording spectra for solid and liquid samples. Deng et al proposed a new and effective single variable selection method named BOSS [4] This method showed a significant improvement of prediction accuracy on three NIR spectroscopic datasets and outperforms partial least square (PLS), Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm coupled with partial least square (GA-PLS). In terms of the selection of spectra intervals, all models except FOSS (i.e. BOSS, SBOSS, and SMCPA) have not considered this method, it can provide a reasonable interpretation Using this method in the proposed model is expected to improve the accuracy as the vibrational spectral band relating to the chemical group generally has a width of 4–200 cm−1 [6].
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have