The geographical origin of Panax ginseng significantly influences its nutritional value and chemical composition, which in turn affects its market price. Traditional methods for analyzing these differences are often time-consuming and require substantial quantities of reagents, rendering them inefficient. Therefore, hyperspectral imaging (HSI) in conjunction with X-ray technology were used for the swift and non-destructive traceability of Panax ginseng origin. Initially, outlier samples were effectively rejected by employing a combined isolated forest algorithm and density peak clustering (DPC) algorithm. Subsequently, random forest (RF) and support vector machine (SVM) classification models were constructed using hyperspectral spectral data. These models were further optimized through the application of 72 preprocessing methods and their combinations. Additionally, to enhance the model’s performance, four variable screening algorithms were employed: SelectKBest, genetic algorithm (GA), least absolute shrinkage and selection operator (LASSO), and permutation feature importance (PFI). The optimized model, utilizing second derivative, auto scaling, permutation feature importance, and support vector machine (2nd Der-AS-PFI-SVM), achieved a prediction accuracy of 93.4 %, a Kappa value of 0.876, a Brier score of 0.030, an F1 score of 0.932, and an AUC of 0.994 on an independent prediction set. Moreover, the image data (including color information and texture information) extracted from color and X-ray images were used to construct classification models and evaluate their performance. Among them, the SVM model constructed using texture information from X -ray images performed the best, and it achieved a prediction accuracy of 63.0 % on the validation set, with a Brier score of 0.181, an F1 score of 0.518, and an AUC of 0.553. By implementing mid-level fusion and high-level data fusion based on the Stacking strategy, it was found that the model employing a high-level fusion of hyperspectral spectral information and X-ray images texture information significantly outperformed the model using only hyperspectral spectral information. This advanced model attained a prediction accuracy of 95.2 %, a Kappa value of 0.912, a Brier score of 0.027, an F1 score of 0.952, and an AUC of 0.997 on the independent prediction set. In summary, this study not only provides a novel technical path for fast and non-destructive traceability of Panax ginseng origin, but also demonstrates the great potential of the combined application of HSI and X-ray technology in the field of traceability of both medicinal and food products.
Read full abstract