This study aims to address the gap in understanding of the impact of the sample quantity, traceability range, and shelf life on the accuracy of mung bean origin traceability models based on near-infrared spectroscopy. Mung beans from Baicheng City, Jilin Province, Dorbod Mongol Autonomous, Tailai County, Heilongjiang Province, and Sishui County, Shandong Province, China, were used. Through near-infrared spectral acquisition (12,000-4000 cm-1) and preprocessing (Standardization, Savitzky-Golay, Standard Normal Variate, and Multiplicative Scatter Correction) of the mung bean samples, the total cumulative variance contribution rate of the first three principal components was determined to be 98.16% by using principal component analysis, and the overall discriminatory correctness of its four origins combined with the K-nearest neighbor method was 98.67%. We further investigated how varying sample quantities, traceability ranges, and shelf lives influenced the discrimination accuracy. Our results indicated a 4% increase in the overall correct discrimination rate. Specifically, larger traceability ranges (Tailai-Sishui) improved the accuracy by over 2%, and multiple shelf lives (90-180-270-360 d) enhanced the accuracy by 7.85%. These findings underscore the critical role of sample quantity and diversity in traceability studies, suggesting that broader traceability ranges and comprehensive sample collections across different shelf lives can significantly improve the accuracy of origin discrimination models.
Read full abstract