Rhizoma Coptidis is a Chinese herbal medicine with antibacterial and anti-inflammatory properties. It has extensive applications in modern medicine. The content of berberine in Rhizoma Coptidis directly determines its quality. Fourier transforms near-infrared (FT-NIR) spectroscopy is a commonly used non-destructive method for rapidly detecting berberine content. In contrast to single-supervised learning algorithms in machine learning, ensemble learning combines individual learning algorithms to create a stable and better-performing strong-supervised model. This study collected spectral data of Rhizoma Coptidis using FT-NIR spectroscopy technology and established a chemometric model using a stacking ensemble approach with multiple models. Partial Least Squares (PLS), Adaptive Boosting (AdaBoost), Gradient boosting decision trees (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) regression models were chosen as alternative base models, different Stacking models were established by random combinations. To fully leverage the strengths of each model and enhance predictive capability, an adaptive inertia weight particle swarm optimization algorithm (AWPSO) was used to search for the optimal parameters. The correlation coefficient of the test (RT) and the root mean square error of the test (RMSET) systematically evaluated the model performance. Finally, AWPSO-RF, AWPSO-XGBoost, and AWPSO-AdaBoost were selected as the base models. The RMSET and RT for RF, XGBoost, and AdaBoost were 0.226, 0.250, 0.195, and 0.871, 0.830, 0.927. After optimizing with the AWPSO algorithm, the RMSET and RT for AWPSO-RF, AWPSO-XGBoost, and AWPSO-AdaBoost were 0.226, 0.245, 0.194, and 0.871, 0.843, 0.922, respectively. The RMSET and RT values for the stacking ensemble were 0.174 and 0.932. The prediction accuracy and generalization ability of multi-model fusion stacking ensemble learning are superior to the single-model regression methods. Therefore, the stacking ensemble learning method that combines AdaBoost, RF, and XGBoost regression models is effective and feasible for assisting in the detection of berberine content in Rhizoma Coptidis.
Read full abstract