Study on Rapid Detection of Original Distillate of Strong-Aroma Baijiu Acid and Ester Component Concentration by Machine Learning Combined with Fourier Transform Infrared
Accurate detection of the key acid and ester components in the original distillate of strong-aroma Baijiu is crucial for Baijiu quality control and provides a scientific basis for blending and storage processes, traditional detection methods primarily rely on sensory evaluation combined with gas chromatography analysis. These methods are time-consuming and subject to operator bias. To address this, we propose a novel hybrid framework integrating Fourier transform infrared (FTIR) spectroscopy and machine learning for rapid detection. FTIR spectral of original distillate of strong-aroma Baijiu samples from different distillation periods were collected and preprocessed using Savitzky-Golay (S-G) smoothing, first-order derivative, and standard normal variate (SNV) transformation. Feature selection was carried out through the least absolute shrinkage and selection operator (LASSO) and competitive adaptive reweighted sampling (CARS). Following this, predictive models were constructed using partial least squares regression (PLSR), long short-term memory (LSTM) networks, and least squares boosting (LSBoost). The parameters of these models were optimized by the Grey Wolf Optimization (GWO) algorithm. The results show that the LASSO-GWO-LSTM model achieved the best performance in predicting the concentrations of ethyl acetate and lactic acid. For the ethyl acetate model, the root mean square error of prediction ( RMSE P ), R square of prediction ( R P 2 ), and relative percentage difference of prediction ( RPD P ) reached 0.0967, 0.9949, and 14.6591, respectively. For the lactic acid prediction model, the corresponding values were 0.0152, 0.9805, and 7.0378. These results far exceed those of conventional PLSR models, indicating that the combination of FTIR and machine learning (LASSO-GWO-LSTM) is a promising method for real-time, high-precision, non-destructive analysis of acid and ester concentrations in the original distillate of strong-aroma Baijiu.
- # Competitive Adaptive Reweighted Sampling
- # Least Absolute Shrinkage And Selection Operator
- # Mean Square Error Of Prediction
- # Square Error Of Prediction
- # Standard Normal Variate
- # Grey Wolf Optimization
- # Partial Least Squares Regression
- # Fourier Transform Infrared
- # Ester Component
- # Long Short-term Memory
- Research Article
- 10.3390/agriculture15141557
- Jul 21, 2025
- Agriculture
Calorific value and moisture content are the key indices to evaluate Caragana pellet fuel’s quality and combustion characteristics. Calorific value is the key index to measure the energy released by energy plants during combustion, which determines energy utilization efficiency. But at present, the determination of solid fuel is still carried out in the laboratory by oxygen bomb calorimetry. This has seriously hindered the ability of large-scale, rapid detection of fuel particles in industrial production lines. In response to this technical challenge, this study proposes using hyperspectral imaging technology combined with various chemometric methods to establish quantitative models for determining moisture content and calorific value in Caragana korshinskii fuel. A hyperspectral imaging system was used to capture the spectral data in the 935–1720 nm range of 152 samples from multiple regions in Inner Mongolia Autonomous Region. For water content and calorific value, three quantitative detection models, partial least squares regression (PLSR), random forest regression (RFR), and extreme learning machine (ELM), respectively, were established, and Monte Carlo cross-validation (MCCV) was chosen to remove outliers from the raw spectral data to improve the model accuracy. Four preprocessing methods were used to preprocess the spectral data, with standard normal variate (SNV) preprocessing performing best on the quantitative moisture content detection model and Savitzky–Golay (SG) preprocessing performing best on the calorific value detection method. Meanwhile, to improve the prediction accuracy of the model to reduce the redundant wavelength data, we chose four feature extraction methods, competitive adaptive reweighted sampling (CARS), successive pojections algorithm (SPA), genetic algorithm (GA), iteratively retains informative variables (IRIV), and combined the three models to build a quantitative detection model for the characteristic wavelengths of moisture content and calorific value of Caragana korshinskii fuel. Finally, a comprehensive comparison of the modeling effectiveness of all methods was carried out, and the SNV-IRIV-PLSR modeling combination was the best for water content prediction, with its prediction set determination coefficient (RP2), root mean square error of prediction (RMSEP), and relative percentage deviation (RPD) of 0.9693, 0.2358, and 5.6792, respectively. At the same time, the moisture content distribution map of Caragana fuel particles is established by using this model. The SG-CARS-RFR modeling combination was the best for calorific value prediction, with its RP2, RMSEP, and RPD of 0.8037, 0.3219, and 2.2864, respectively. This study provides an innovative technical solution for Caragana fuel particles’ value and quality assessment.
- Research Article
42
- 10.1038/srep30313
- Jul 29, 2016
- Scientific Reports
This paper investigated the feasibility of Fourier transform infrared transmission (FT-IR) spectroscopy to detect talcum powder illegally added in tea based on chemometric methods. Firstly, 210 samples of tea powder with 13 dose levels of talcum powder were prepared for FT-IR spectra acquirement. In order to highlight the slight variations in FT-IR spectra, smoothing, normalize and standard normal variate (SNV) were employed to preprocess the raw spectra. Among them, SNV preprocessing had the best performance with high correlation of prediction (RP = 0.948) and low root mean square error of prediction (RMSEP = 0.108) of partial least squares (PLS) model. Then 18 characteristic wavenumbers were selected based on a hybrid of backward interval partial least squares (biPLS) regression, competitive adaptive reweighted sampling (CARS) algorithm and successive projections algorithm (SPA). These characteristic wavenumbers only accounted for 0.64% of the full wavenumbers. Following that, 18 characteristic wavenumbers were used to build linear and nonlinear determination models by PLS regression and extreme learning machine (ELM), respectively. The optimal model with RP = 0.963 and RMSEP = 0.137 was achieved by ELM algorithm. These results demonstrated that FT-IR spectroscopy with chemometrics could be used successfully to detect talcum powder in tea.
- Research Article
9
- 10.1142/s1793545815500340
- Oct 27, 2015
- Journal of Innovative Optical Health Sciences
As an important process analysis tool, near infrared spectroscopy (NIRS) has been widely used in process monitoring. In the present work, the feasibility of NIRS for monitoring the moisture content of human coagulation factor VIII (FVIII) in freeze-drying process was investigated. A partial least squares regression (PLS-R) model for moisture content determination was built with 88 samples. Different pre-processing methods were explored, and the best method found was standard normal variate (SNV) transformation combined with 1st derivation with Savitzky–Golay (SG) 15 point smoothing. Then, four different variable selection methods, including uninformative variable elimination (UVE), interval partial least squares regression (iPLS), competitive adaptive reweighted sampling (CARS) and manual method, were compared for eliminating irrelevant variables, and iPLS was chosen as the best variable selection method. The correlation coefficient (R), correlation coefficient of calibration set (R cal ), correlation coefficient of validation set (R val ), root mean square errors of cross-validation (RMSECV) and root mean square errors of prediction (RMSEP) of PLS model were 0.9284, 0.9463, 0.8890, 0.4986% and 0.4514%, respectively. The results showed that the model for moisture content determination has a wide range, good linearity, accuracy and precision. The developed approach was demonstrated to be a potential for monitoring the moisture content of FVIII in freeze-drying process.
- Research Article
27
- 10.1016/j.jfca.2022.104938
- Sep 27, 2022
- Journal of Food Composition and Analysis
Application of near-infrared hyperspectral imaging coupled with chemometrics for rapid and non-destructive prediction of protein content in single chickpea seed
- Research Article
74
- 10.1016/j.saa.2019.118005
- Jan 14, 2020
- Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy
Rapid detection of adulteration of minced beef using Vis/NIR reflectance spectroscopy with multivariate methods.
- Research Article
63
- 10.1016/j.jpba.2008.09.054
- Oct 17, 2008
- Journal of Pharmaceutical and Biomedical Analysis
Quantitative solid-state analysis of three solid forms of ranitidine hydrochloride in ternary mixtures using Raman spectroscopy and X-ray powder diffraction
- Research Article
62
- 10.1016/j.infrared.2019.103034
- Sep 12, 2019
- Infrared Physics & Technology
Rapid prediction and visualization of moisture content in single cucumber (Cucumis sativus L.) seed using hyperspectral imaging technology
- Research Article
25
- 10.1016/j.biosystemseng.2019.06.010
- Jun 19, 2019
- Biosystems Engineering
Utilising near-infrared hyperspectral imaging to detect low-level peanut powder contamination of whole wheat flour
- Book Chapter
2
- 10.1007/978-981-19-4884-8_32
- Jan 1, 2022
Extreme learning machine (ELM) has received increasing attention in multivariate calibration of complex samples due to its advantages of fast learning speed and good generalization ability. However, irrelevant variables in spectral matrix to target can interfere the quality of ELM modeling. Therefore, variable selection is required before multivariate calibration. In this study, least absolute shrinkage and selection operator (LASSO) combined with ELM (LASSO-ELM) is used for spectral quantitative analysis of complex samples. In the method, LASSO is firstly used to selected variables by shrinking regression coefficients of unselected variables to zero. The optimal model position s of LASSO is determined by Sp criterion. Then ELM model is built between the selected variables and analyzed target with the optimal activation function and hidden node number determined by the ratio of mean to standard deviation of correlation coefficients (MSR). Near infrared (NIR) spectra of tobacco lamina and ultraviolet (UV) spectra of fuel oil samples are used to evaluate the prediction performance of LASSO-ELM. Results show that only with tens of variables, LASSO-ELM achieves the lowest root mean square error of prediction (RMSEP) and highest correlation coefficient (R) compared with full-spectrum partial least squares (PLS) and ELM. Thus, LASSO-ELM is an effictive variable selection and multivariate calibration method for quanatitive analysis of complex samples.KeywordsLeast absolute shrinkage and selection operatorExtreme learning machineVariable selectionSpectral analysisQuantification
- Research Article
- 10.30560/as.v7n1p84
- Apr 13, 2025
- Agricultural Science
This study explored the application of a portable near-infrared (NIR) spectrometer for analyzing silage corn feed quality, specifically focusing on developing a quantitative detection model for dry matter content. Spectral data were collected within the 855-1890 nm range using a portable NIR spectrometer, and the dataset was partitioned into calibration and prediction sets using the SPXY algorithm. An Extreme Learning Machine (ELM) model optimized by Particle Swarm Optimization (PSO) was employed for modeling. Five preprocessing methods were evaluated: Moving Average Filter (MAF), Savitzky-Golay Filter (SGF), Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV) transformation, and First Derivative (FD). To enhance model performance, feature wavelengths were selected using three methods: Bootstrap Soft Shrinkage (BOSS), Competitive Adaptive Reweighted Sampling (CARS), and Iterative Retained Information Variable (IRIV). The optimal model combining SNV preprocessing with BOSS feature selection achieved a prediction correlation coefficient () of 0.8708 and Root Mean Square Error of Prediction (RMSEP) of 0.6802. These results demonstrate the potential of portable NIR spectroscopy for rapid dry matter content determination in silage corn feed.
- Research Article
7
- 10.3390/s19143124
- Jul 15, 2019
- Sensors
Firmness changes in Nanguo pears under different freezing/thawing conditions have been characterized by hyperspectral imaging (HSI). Four different freezing/thawing conditions (the critical temperatures, numbers of cycles, holding time and cooling rates) were set in this experiment. Four different pretreatment methods were used: multivariate scattering correction (MSC), standard normal variate (SNV), Savitzky-Golay standard normal variate (S-G-SNV) and Savitzky-Golay multiplicative scattering correction (S-G-MSC). Combined with competitive adaptive reweighted sampling (CARS) to identify characteristic wavelengths, firmness prediction models of Nanguo pears under different freezing/thawing conditions were established by partial least squares (PLS) regression. The performance of the firmness model was analyzed quantitatively by the correlation coefficient (R), the root mean square error of calibration (RMSEC), the root mean square error of prediction (RMSEP) and the root mean square error of cross validation (RMSECV). The results showed that the MSC-PLS model has the highest accuracy at different cooling rates and holding times; the correlation coefficients of the calibration set (Rc) were 0.899 and 0.927, respectively, and the correlation coefficients of the validation set (Rp) were 0.911 and 0.948, respectively. The accuracy of the SNV-PLS model was the highest at different numbers of cycles, and the Rc and the Rp were 0.861 and 0.848, respectively. The RMSEC was 65.189, and the RMSEP was 65.404. The accuracy of the S-G-SNV-PLS model was the highest at different critical temperatures, with Rc and Rp values of 0.854 and 0.819, respectively, and RMSEC and RMSEP values of 74.567 and 79.158, respectively.
- Research Article
34
- 10.1016/j.microc.2021.106642
- Jul 14, 2021
- Microchemical Journal
Comparison of wavelength selected methods for improving of prediction performance of PLS model to determine aflatoxin B1 (AFB1) in wheat samples during storage
- Research Article
20
- 10.1016/j.microc.2024.110276
- Mar 2, 2024
- Microchemical Journal
Sustainable chemometric methods boosted by Latin hypercube technique for quantifying the recently FDA-approved combination of bupivacaine and meloxicam in the presence of bupivacaine carcinogenic impurity: Comprehensive greenness, blueness, and whiteness assessments
- Research Article
25
- 10.1186/s13065-024-01158-7
- Mar 18, 2024
- BMC Chemistry
Montelukast sodium (MLK) and Levocetirizine dihydrochloride (LCZ) are widely prescribed medications with promising therapeutic potential against COVID-19. However, existing analytical methods for their quantification are unsustainable, relying on toxic solvents and expensive instrumentation. Herein, we pioneer a green, cost-effective chemometrics approach for MLK and LCZ analysis using UV spectroscopy and intelligent multivariate calibration. Following a multilevel multifactor experimental design, UV spectral data was acquired for 25 synthetic mixtures and modeled via classical least squares (CLS), principal component regression (PCR), partial least squares (PLS), and genetic algorithm-PLS (GA-PLS) techniques. Latin hypercube sampling (LHS) strategically constructed an optimal validation set of 13 mixtures for unbiased predictive performance assessment. Following optimization of the models regarding latent variables (LVs) and wavelength region, the optimum root mean square error of cross-validation (RMSECV) was attained at 2 LVs for the 210–400 nm spectral range (191 data points). The GA-PLS model demonstrated superb accuracy, with recovery percentages (R%) from 98 to 102% for both analytes, and root mean square error of calibration (RMSEC) and prediction (RMSEP) of (0.0943, 0.1872) and (0.1926, 0.1779) for MLK and LCZ, respectively, as well bias-corrected mean square error of prediction (BCMSEP) of -0.0029 and 0.0176, relative root mean square error of prediction (RRMSEP) reaching 0.7516 and 0.6585, and limits of detection (LOD) reaching 0.0813 and 0.2273 for MLK and LCZ respectively. Practical pharmaceutical sample analysis was successfully confirmed via standard additions. We further conducted pioneering multidimensional sustainability evaluations using state-of-the-art greenness, blueness, and whiteness tools. The method demonstrated favorable environmental metrics across all assessment tools. The obtained Green National Environmental Method Index (NEMI), and Complementary Green Analytical Procedure Index (ComplexGAPI) quadrants affirmed green analytical principles. Additionally, the method had a high Analytical Greenness Metric (AGREE) score (0.90) and a low carbon footprint (0.021), indicating environmental friendliness. We also applied blueness and whiteness assessments using the high Blue Applicability Grade Index (BAGI) and Red–Green–Blue 12 (RGB 12) algorithms. The high BAGI (90) and RGB 12 (90.8) scores confirmed the method's strong applicability, cost-effectiveness, and sustainability. This work puts forward an optimal, economically viable green chemistry paradigm for pharmaceutical quality control aligned with sustainable development goals.
- Research Article
2
- 10.1080/01431161.2024.2402005
- Oct 3, 2024
- International Journal of Remote Sensing
Leaf nitrogen content (LNC) is an essential indicator of crop nitrogen status. To rapidly and correctly estimate the LNC using hyperspectral remote sensing, the canopy hyperspectral reflectance of coffee trees treated with five levels of nitrogen fertilization in a greenhouse was obtained in this study. Five methods were used for hyperspectral data preprocessing, namely, Savitzky–Golay (SG) smoothing, a combination of SG and standard normal variate transformation (SG-SNV), a combination of SG and first-order derivative (SG-FD), a combination of SG and second-order derivative (SG-SD), and a combination of SG and multiplicative scatter correction (SG-MSC). Feature wavelengths were extracted using variables combination population analysis (VCPA), competitive adaptive reweighted sampling (CARS), and the combination (CARS-SPA) of CARS and successive projections algorithm (SPA). Vegetation indexes (VIs) were constructed and subjected to correlation analysis and variance inflation factor (VIF) analysis. Linear and nonlinear models including partial least squares regression (PLSR), back propagation neural network (BPNN), extreme learning machine (ELM), random forest regression (RFR), and support vector regression (SVR), were adopted to construct LNC retrieval models for coffee trees. The results indicated that SG-MSC could increase the signal-to-noise ratio of hyperspectral data well. The wavelengths selected by CARS-SPA were more relevant to LNC, and combined with ELM resulted in the best performance of LNC prediction (R2 P = 0.901, RMSEP = 0.825 g·kg−1, RPD = 3.229). Ten VIs were obtained through correlation analysis and VIF, and the VIs-based ELM prediction model also performed moderately well (R2 P = 0.814, RMSEP = 1.131 g·kg−1, RPD = 2.354). By comparing the coffee LNC prediction models established by different methods, two coffee LNC inversion models with better prediction accuracy were obtained, which provide a scientific basis for accurate diagnosis of coffee trees LNC, and are of great significance for optimizing the field management.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.