Validating the feasibility of the method of ensemble learning combined with FT-MIR for the discrimination of wild Paris polyphylla var. Yunnanensis from different geographical sources

Mingyu Han,Mingyu Han,Yuanzhong Wang

doi:10.1016/j.microc.2024.110824

Abstract

As a widely used medicinal plant with high economic value, Paris polyphylla var. yunnanensis (PPY), is mainly distributed in south-western China. Since different growing environments have a strong influence on the PPY quality, efficient means are needed to discriminate its origin source. The traditional machine learning model uses a single learner to classify and discriminate the samples, but it is difficult to apply its advantages to play for complex data samples. Ensemble learning can integrate the advantages of multiple learners to improve the model performance. In this experiment, Partial Least Squares Discriminant Analysis (PLS-DA), Random Subspace Method (RSM), Random Forest (RF), and Support Vector Machine (SVM) models were used to discriminate wild PPY from different origins, and then verify the feasibility of ensemble learning in discriminating them from different geographical sources. At the same time, the results of preprocessing the spectral data with multiplicative scattering correction (MSC), standard normal variate (SNV), and derivative operations (first, second and third derivatives) and their combination methods were compared. The results show that the combined preprocessing greatly improves the model performance. The RSM has the most stable performance with an accuracy of more than 85%. Indicating that the ensemble learning combined with Fourier transform mid-infrared spectroscopy (FT-MIR) has the feasibility of discriminating the wild PPY of different geographical origins.

Full Text