Safflower seed oil (SSO), a crucial edible oil, occupies a significant position in human diet due to its unique nutritional components. However, because of its high economic value, SSO has become a primary target for food adulteration. To rapidly and effectively detect the concentration of adulterants in SSO, a research approach using hyperspectral imaging and gas chromatography-mass spectrometry (GC-MS) was proposed as technical means, employing machine learning as the methodology. Specifically, different preprocessing methods were compared, with median filtering (MF) selected for spectral data to significantly reduce noise and improve the robustness and generalization ability of the model. Regarding the selection of feature bands, bands around 440 nm, 530 nm, and 880 nm – 950 nm were identified as more favorable for establishing a predictive model for the concentration of adulterants in SSO, while also shortening the modeling time. By constructing an ensemble learning model with ridge regression (Ridge) and partial least squares regression (PLSR) as base models and LightGBM as the meta-model, achieving high-precision prediction of the concentration of adulterants in SSO. Furthermore, by jointly modeling the linoleic acid, oleic acid and palmitic acid measured by GC-MS with hyperspectral data, the model's R2 was improved to 0.976, highlighting its outstanding performance. Therefore, this study identifies MF-Ridge-Stacking as the optimal model for predicting the concentration of adulterants in SSO. The research not only provides theoretical and technical support for adulteration identification but also presents a new method for predicting the concentration of adulterants in SSO, with potential practical application value.
Read full abstract