Abstract

Building an excellent model is crucial for the near-infrared technology used in various fields. Full-spectrum model building has negative influence on the predictive ability of models owing to the use of several invalid variables, variable selection can improve model performance. Several variable selection approaches along with evolutionary and swarm intelligent algorithms have been successfully implemented. However, most approaches lack relevance between the near-infrared spectral features and analyte, thus increasing the risk of model overfitting. In this study, a spectral variable selection approach based on a fast nondominated sorting genetic algorithm is proposed for improving the relevance between the near-infrared spectral features and analyte. This approach includes two objective functions: (1) maximizing the ratio of interclass to in-class sum of standard deviation and (2) minimizing the sum of correlation coefficient between the selected variables. The former focuses on the relevance between the spectral features and analyte, whereas the latter focuses on reducing the correlation between the selected variables to avoid invalid variables. The fast nondominated sorting genetic algorithm has multiple solutions, such as Pareto solution set, and the optimal solution is determined using the root mean square error of cross validation. To validate this algorithm using partial least squares, tobacco nicotine and total sugar models were built under varying number of selected variables. In comparison to a model with 1036 variables, the root mean square error of prediction of the nicotine model with 150–600 variables decreased by more than 15% and up to 23.93%, and that of the total sugar model with 100–600 variables decreased by more than 7.5% and up to 13.79%. This study showed that model performance can be significantly improved under a wide range of variables selected by the fast nondominated sorting genetic algorithm. Additionally, better prediction was obtained by fast nondominated sorting genetic algorithm when compared to interval selectivity ratio and full spectrum partial least squares modeling.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call