To identify the geographical origin of millet accurately, 36 samples of Guangling millet, Qinzhouhuang millet, Liuseng millet, Qiananhuang millet, and 33 samples of Yuzhou millet were collected. Mid-infrared (mid-IR) spectra of all the samples were obtained. Denoising, standard normal variate (SNV), multiplicative scatter correction (MSC), and normalization were carried out to preprocess the data. Principal component analysis (PCA) was used to reduce the dimension of the data, combined with support vector machine (SVM), and the geographical origin of the five kinds of millet was identified. The recognition accuracy of the training set (99.2%) and the prediction set (98.3%) were highest when using the first 12 principal components, indicating that the established mid-IR spectroscopic identification model was feasible and effective. PCA, window analysis, hierarchical clustering analysis, and SVM were combined to extract the feature information of mid-IR spectra of millet from five producing areas. Five wavenumbers, 1026, 1053, 1685, 1715, and 1744 cm-1, were found to be with small correlation, and the recognition accuracy of the training set and the prediction set based on these five features were 95.8% and 100.0%, respectively. The feature extraction method established here could be used to improve the prediction efficiency of the identification model and provide data support for the analysis of differential components.
Read full abstract