Abstract
A modification of ensemble Monte Carlo uninformative variable elimination (EMCUVE) is proposed, which does not involve the use of random variables, with the aim of improving the performance of partial least squares (PLS) regression models, increasing the consistency of results and reducing processing time by selecting the most informative variables in a spectral dataset. The proposed method (ensemble Monte Carlo variable selection—EMCVS) and the robust version (REMCVS) were compared to PLS models and with the existing EMCUVE method using three near infrared (NIR) datasets, i.e. prediction of n-butanol in a five-solvent mixture, moisture in corn and glucosinolates in rapeseed. The proposed methods were more consistent, produced models with better predictive accuracy (lower root mean squared error of prediction) and required less computational time than the conventional EMCUVE method on these datasets. In this application, the proposed method was applied to PLS regression coefficients but it may, in principle, be used on any regression vector.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have