Abstract

Multivariate statistical analysis such as partial least square regression (PLSR) is the common data processing technique used to handle high-dimensional data space on near-infrared (NIR) spectral datasets. The PLSR is useful to tackle the multicollinearity and heteroscedasticity problem that can be commonly found in such data space. With the problem of the nonlinear structure in the original input space, the use of the classical PLSR model might not be appropriate. In addition, the contamination of multiple outliers and high leverage points (HLPs) in the dataset could further damage the model. Generally, HLPs contain both good leverage points (GLPs) and bad leverage points (BLPs); therefore, in this case, removing the BLPs seems relevant since it has a significant impact on the parameter estimates and can slow down the convergence process. On the other hand, the GLPs provide a good efficiency in the model calibration process; thus, they should not be eliminated. In this study, robust alternatives to the existing kernel partial least square (KPLS) regression, which are called the kernel partial robust GM6-estimator (KPRGM6) regression and the kernel partial robust modified GM6-estimator (KPRMGM6) regression are introduced. The nonlinear solution on PLSR was handled through kernel-based learning by nonlinearly projecting the original input data matrix into a high-dimensional feature mapping that corresponded to the reproducing kernel Hilbert spaces (RKHS). To increase the robustness, the improvements on GM6 estimators are presented with the nonlinear PLSR. Based on the investigation using several artificial dataset scenarios from Monte Carlo simulations and two sets from the near-infrared (NIR) spectral dataset, the proposed robust KPRMGM6 is found to be superior to the robust KPRGM6 and non-robust KPLS.

Highlights

  • IntroductionMultivariate statistical analysis is the common method used in the pre-treatment screening, processing, and interpreting of near-infrared (NIR) spectral data

  • In vibrational spectroscopic techniques, multivariate statistical analysis is the common method used in the pre-treatment screening, processing, and interpreting of near-infrared (NIR) spectral data

  • The methods combine the benefits of linear partial least square regression (PLSR) and the kernel-based learning RHKS with the robustness of the modified GM6 estimators

Read more

Summary

Introduction

Multivariate statistical analysis is the common method used in the pre-treatment screening, processing, and interpreting of near-infrared (NIR) spectral data. It allows a huge number of spectral to be processed in relation to the amount of chemical quantities. With its dataset complexity, it suffers from contamination of multiple outliers and high leverage points (HLPs). These are important factors that can contribute to inaccurate interpretation and can be computationally intensive. It seems timely to introduce some alternatives to the nonlinear robust multivariate method that can handle irregular data space problems and are able to identify outliers and BLPs in the dataset

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call