The allure of substantial profits has perpetuated the illicit trade of counterfeit vintage labels for baijiu. While various approaches have been employed to intelligently ascertain the vintage of baijiu, many of them are both cost-intensive and time-consuming. This work pioneered the use of Attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy, coupled with chemometric analysis, offering a non-destructive and economically viable method for discriminating sauce-flavor baijiu across different aging periods (1-, 2-, and 3-year). In this research, principal component analysis (PCA) was first conducted to explore clustering trends among distinct vintage groups. Subsequently, the effect of spectral pre-processing on modeling performance was explored. For wavelength selection, four wavelength selection methods (ReliefF, random forest variable importance (RFVI), variable importance in projection (VIP), and Venn) were first used to identify the subset of candidate features that potentially best mapped the vintage labels. Immediately following this, to explore the possibility of further improving the identification capabilities of the model as well as to reduce the redundant data that may still be present, sequential backward selection (SBS) was utilized for secondary feature reduction within the subset of candidates. The amalgamation of these two techniques is termed a “hybrid wavelength selection strategy.” Additionally, the dimensionality reduction effects of PCA and kernel principal component analysis (KPCA) were compared to demonstrate the robustness of the proposed method. Finally, classification models such as partial least squares discriminant analysis (PLS-DA), random forest (RF), and grasshopper optimization algorithm-based support vector machine (GOA-SVM) were developed. The results show that the spectral data need not be pre-processed, and the proposed hybrid wavelength selection strategy can further improve the identification ability of the model. Among the many models developed, ReliefF-SBS-GOA-SVM emerged as the most proficient classification model, yielding accuracy, sensitivity, and specificity rates of 94.44%, 95.23%, and 94.44%, respectively. This method not only holds promise for the discrimination of baijiu class attributes such as brand, origin, flavor, and vintage but also exhibits potential applicability in other non-targeted identification studies involving spectroscopy methodologies.
Read full abstract