Background: As the demand for traditional Chinese medicinal materials increases in China and even the world, there is an urgent need for an effective and simple identification technology to identify the origin and quality of the latter and ensure the safety of clinical medication. Mineral element analysis and isotope finger-printing are the two commonly used techniques in traditional origin identification. Both of these techniques require the use of stoichiometric methods in the identification process. Although they have high accuracy and sensitivity, they are expensive and inefficient. In addition, near-infrared spectroscopy is a fast, nondestructive, and widely used identification technique developed in recent years, but its identification results are susceptible to samples' states and environmental conditions, and its sensitivity is low. Hyperspectral imaging combines the advantages of imaging technology and optical technology, which can simultaneously access the image information and spectral information which reflect the external characteristics, internal physical structure, and chemical composition of the samples. Hyperspectral imaging is widely applied to agricultural product inspection, but research into its application in origin and quality identification of TCM materials is rare. Methods: In this study, the algorithm framework discriminative marginalized least squares regression (DMLSR) was used for feature extraction of frankincense hyperspectral data. The DMLSR with intraclass compactness graph and manifold regularization can efficiently learn the projective samples with higher separability and less redundant information than the original samples. Then, the discriminative collaborative representation with Tikhonov regularization (DCRT) was applied for classifying the geographical origin and level of frankincense. DCRT introduces the discriminant regularization term and incorporates SID, which is more sensitive to the spectrum as the measurement method and is more suitable for the frankincense spectral data compared with SVM. Results: For the origin classification task, samples of all levels from each origin were, respectively, selected for three-way classification. We used 10-fold cross-validation to select a model parameter in the experiment. When obtaining the optimal parameters, we randomly selected the training set and testing set, where the training set accounts for 70% and the training set for 30%. After repeating this random process 10 times, we obtained the final average classification accuracy, which is higher than 90%, and the standard deviation fluctuation is usually small. For the level classification task, samples of each level from three origins were separately selected for multiclassification. We randomly selected the training set and testing set from each origin. The level classification results of the three origins are good on D4350 data, and the classification accuracy of each level is basically above 80%. Conclusion: Experiments and analysis show that our algorithm framework has excellent classification performance, which is stable in origin classification and has potential for generalization. In addition, the experiments show that in our algorithm framework, different classification tasks need to combine different data sources to achieve better classification and recognition, as the origin classification task uses frankincense's D3000 data, and level classification task uses frankincense's D4350 data.
Read full abstract