This paper proposes a data-driven classification model for traditional Chinese medicinal herbs based on mid-infrared spectral data. Addressing the limitations of traditional identification methods when the herbs’ appearance is damaged or incomplete, this study employs machine learning techniques to achieve accurate classification of medicinal herb types through spectral data preprocessing and feature extraction. Firstly, the Savitzky-Golay convolution smoothing method and Standard Normal Variate (SNV) transformation were used for denoising the spectral data. Then, Principal Component Analysis (PCA) was employed to reduce the dimensionality of the high-dimensional spectral data and extract the key features. Finally, the Gaussian Mixture Model (GMM) was applied to cluster the reduced data, categorizing the medicinal herbs into six classes. The results show that this method produces the accurate and stable classification. The constructed model is not only applicable to the classification and origin identification of medicinal herbs but also provides an important reference value for the classification and origin identification of other plant species.
Read full abstract