Abstract

The dimensionality of near-infrared (NIR) spectral data is often extremely large. Dimensionality reduction of spectral data can effectively reduce the redundant information and correlation between spectral variables and simplify the model, which is crucial to increasing the model's performance. As a nonlinear feature extraction method, Laplacian Eigenmaps (LE) may preserve the local neighborhood information of the dataset, has high robustness, and is simple to compute. However, when the LE algorithm maps the data from high-dimensional space to low-dimensional space, it is often disturbed by irrelevant information and multicollinearity in the spectral data, which lowers the model's prediction performance. Random Frog (RF) algorithm can eliminate noise and collinearity in the spectrum. Therefore, before using the LE algorithm, we use the RF algorithm to eliminate irrelevant information in the spectrum and reduce the correlation between the spectra variables to increase the efficiency of the LE algorithm. We used the RF + LE algorithm to reduce the dimensionality of two public NIRS datasets (soil datasets and pharmaceutical tablets datasets) and compared it with RF and LE algorithms alone. We utilized Partial Least Squares Regression (PLSR) and Support Vector Regression (SVR) to establish regression models. The experimental findings demonstrate that compared with the RF algorithm and LE algorithm, the RF + LE combination method can reduce the dimension of spectral variables and model complexity, and improve regression models' prediction accuracy and stability. It is an effective dimensionality reduction method for the near-infrared spectrum.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call