Radix Bupleuri has been widely used for its plentiful pharmacological effects. But it is hard to evaluate their safety and efficacy because the concentrations of components are tightly affected by the surrounding environment. Thus, Radix Bupleuri samples from different regions and varieties were collected. Based on the experimental and computational Raman spectrum, machine learning is emphasized for certain obscured characteristics; for example, linear discriminant analysis (LDA), support vector machine (SVM), eXtreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM). After dimension reduction by LDA, models of SVM, XGBoost and LightGBM were trained for classification and regression prediction of Bupleurum production regions. Support vector classifiers achieved the best accuracy of 98% and an F1 score above 0.96 on the test set. Support vector regression has a good fitting performance with an R2 score above 0.90 and a relatively low mean square error. However, complex models were prone to overfitting, resulting in poor generalization ability. Among these machine learning models, the typical LDA-SVM models, consistent with the high-performance liquid chromatography results, demonstrate great performance and stability. We envision that this rapid classification and regression technique can be extended to predictions for other herbs. © 2024 Society of Chemical Industry.
Read full abstract