Abstract

The MC-UVE-SPA method is commonly proposed as a variable selection approach for multivariate calibration. However, the SPA tends to select wavelength variables that are sparsely distributed over the wavelength ranges of the variables selected by the MC-UVE algorithm, and the MC-UVE-SPA cascade cannot improve the problem of wavelength point discontinuity. It is addressed in this paper by proposing a moving-window- (MW-) improved MC-UVE-SPA wavelength selection algorithm. The proposed algorithm improves the continuity of the selected wavelength variables and thereby better exploits the advantages of the MC-UVE algorithm and the SPA to obtain regression models with high prediction accuracy. The MC-UVE, MC-UVE-SPA, and MC-UVE-SPA-MW algorithms are applied for conducting wavelength variable selection for the NIR spectral absorbance data of corn, diesel fuel, and ethylene. Here, partial least squares regression (PLSR) models reflecting the oil content of corn, the boiling point of diesel fuel, and the ethylene concentration are established after conducting wavelength selection using the MC-UVE algorithm, and corresponding multiple linear regression (MLR) models are established after conducting wavelength selection using the MC-UVE-SPA and MC-UVE-SPA-MW algorithms. Experimental results demonstrate that the progressive elimination of uncorrelated and collinear variables generates increasingly simplified partial-spectrum models with greater prediction accuracy than the full-spectrum model. Among the three wavelength selection algorithms, the MC-UVE-SPA selected the least number of wavelength variables, while the proposed MC-UVE-SPA-MW algorithm provided models with the greatest prediction accuracy.

Highlights

  • With the characteristics of simple, rapid, noninvasive, and no sample pretreatment, near-infrared (NIR) spectroscopy [1] has been adopted as a popular analytical tool for both qualitative and quantitative analyses in various fields [2,3,4,5].e quantitative analysis of NIR spectral data is generally conducted through the construction of regression models, such as those based on principle component analysis (PCA) [6], partial least squares (PLS) regression [7], and multiple linear regression (MLR) [8], which take the characteristic wavelengths of the spectral data as input variables

  • E quantitative analysis of NIR spectral data is generally conducted through the construction of regression models, such as those based on principle component analysis (PCA) [6], partial least squares (PLS) regression [7], and multiple linear regression (MLR) [8], which take the characteristic wavelengths of the spectral data as input variables

  • Few studies have considered improving the continuity of the selected wavelength in the wavelength point selection algorithm. erefore, this paper considers the continuity of the wavelength selected by the Monte Carlo (MC)-uninformative variable elimination (UVE)-successive projections algorithm (SPA)

Read more

Summary

Introduction

E quantitative analysis of NIR spectral data is generally conducted through the construction of regression models, such as those based on principle component analysis (PCA) [6], partial least squares (PLS) regression [7], and multiple linear regression (MLR) [8], which take the characteristic wavelengths of the spectral data as input variables. The development of modern analytical instruments has led to the capability of acquiring NIR spectral data that can contain hundreds to tens of thousands of individual wavelengths [9]. Among the above-discussed methods, variable selection has become the dominant method of interest in recent years for the development of NIR spectral analysis technology and chemometrics [11,12,13,14]. E most commonly employed wavelength selection algorithms developed far include uninformative variable elimination (UVE) and the successive projections algorithm (SPA) Uninformative wavelength variables have either no effect or a negative effect on the modeling performance. e wavelength selection process fulfils three purposes, including (1) providing models with greater predicative capability, (2) obtaining wavelength variables that provide greater modeling efficiency, and (3) providing simpler models with improved interpretability [9]. e most commonly employed wavelength selection algorithms developed far include uninformative variable elimination (UVE) and the successive projections algorithm (SPA)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call