Study on Rapid Detection of Original Distillate of Strong-Aroma Baijiu Acid and Ester Component Concentration by Machine Learning Combined with Fourier Transform Infrared

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Accurate detection of the key acid and ester components in the original distillate of strong-aroma Baijiu is crucial for Baijiu quality control and provides a scientific basis for blending and storage processes, traditional detection methods primarily rely on sensory evaluation combined with gas chromatography analysis. These methods are time-consuming and subject to operator bias. To address this, we propose a novel hybrid framework integrating Fourier transform infrared (FTIR) spectroscopy and machine learning for rapid detection. FTIR spectral of original distillate of strong-aroma Baijiu samples from different distillation periods were collected and preprocessed using Savitzky-Golay (S-G) smoothing, first-order derivative, and standard normal variate (SNV) transformation. Feature selection was carried out through the least absolute shrinkage and selection operator (LASSO) and competitive adaptive reweighted sampling (CARS). Following this, predictive models were constructed using partial least squares regression (PLSR), long short-term memory (LSTM) networks, and least squares boosting (LSBoost). The parameters of these models were optimized by the Grey Wolf Optimization (GWO) algorithm. The results show that the LASSO-GWO-LSTM model achieved the best performance in predicting the concentrations of ethyl acetate and lactic acid. For the ethyl acetate model, the root mean square error of prediction ( RMSE P ), R square of prediction ( R P 2 ), and relative percentage difference‌ of prediction ( RPD P ) reached 0.0967, 0.9949, and 14.6591, respectively. For the lactic acid prediction model, the corresponding values were 0.0152, 0.9805, and 7.0378. These results far exceed those of conventional PLSR models, indicating that the combination of FTIR and machine learning (LASSO-GWO-LSTM) is a promising method for real-time, high-precision, non-destructive analysis of acid and ester concentrations in the original distillate of strong-aroma Baijiu.

Similar Papers
  • Research Article
  • 10.3390/agriculture15141557
Prediction of the Calorific Value and Moisture Content of Caragana korshinskii Fuel Using Hyperspectral Imaging Technology and Various Stoichiometric Methods
  • Jul 21, 2025
  • Agriculture
  • Xuehong De + 5 more

Calorific value and moisture content are the key indices to evaluate Caragana pellet fuel’s quality and combustion characteristics. Calorific value is the key index to measure the energy released by energy plants during combustion, which determines energy utilization efficiency. But at present, the determination of solid fuel is still carried out in the laboratory by oxygen bomb calorimetry. This has seriously hindered the ability of large-scale, rapid detection of fuel particles in industrial production lines. In response to this technical challenge, this study proposes using hyperspectral imaging technology combined with various chemometric methods to establish quantitative models for determining moisture content and calorific value in Caragana korshinskii fuel. A hyperspectral imaging system was used to capture the spectral data in the 935–1720 nm range of 152 samples from multiple regions in Inner Mongolia Autonomous Region. For water content and calorific value, three quantitative detection models, partial least squares regression (PLSR), random forest regression (RFR), and extreme learning machine (ELM), respectively, were established, and Monte Carlo cross-validation (MCCV) was chosen to remove outliers from the raw spectral data to improve the model accuracy. Four preprocessing methods were used to preprocess the spectral data, with standard normal variate (SNV) preprocessing performing best on the quantitative moisture content detection model and Savitzky–Golay (SG) preprocessing performing best on the calorific value detection method. Meanwhile, to improve the prediction accuracy of the model to reduce the redundant wavelength data, we chose four feature extraction methods, competitive adaptive reweighted sampling (CARS), successive pojections algorithm (SPA), genetic algorithm (GA), iteratively retains informative variables (IRIV), and combined the three models to build a quantitative detection model for the characteristic wavelengths of moisture content and calorific value of Caragana korshinskii fuel. Finally, a comprehensive comparison of the modeling effectiveness of all methods was carried out, and the SNV-IRIV-PLSR modeling combination was the best for water content prediction, with its prediction set determination coefficient (RP2), root mean square error of prediction (RMSEP), and relative percentage deviation (RPD) of 0.9693, 0.2358, and 5.6792, respectively. At the same time, the moisture content distribution map of Caragana fuel particles is established by using this model. The SG-CARS-RFR modeling combination was the best for calorific value prediction, with its RP2, RMSEP, and RPD of 0.8037, 0.3219, and 2.2864, respectively. This study provides an innovative technical solution for Caragana fuel particles’ value and quality assessment.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 42
  • 10.1038/srep30313
Rapid detection of talcum powder in tea using FT-IR spectroscopy coupled with chemometrics.
  • Jul 29, 2016
  • Scientific Reports
  • Xiaoli Li + 2 more

This paper investigated the feasibility of Fourier transform infrared transmission (FT-IR) spectroscopy to detect talcum powder illegally added in tea based on chemometric methods. Firstly, 210 samples of tea powder with 13 dose levels of talcum powder were prepared for FT-IR spectra acquirement. In order to highlight the slight variations in FT-IR spectra, smoothing, normalize and standard normal variate (SNV) were employed to preprocess the raw spectra. Among them, SNV preprocessing had the best performance with high correlation of prediction (RP = 0.948) and low root mean square error of prediction (RMSEP = 0.108) of partial least squares (PLS) model. Then 18 characteristic wavenumbers were selected based on a hybrid of backward interval partial least squares (biPLS) regression, competitive adaptive reweighted sampling (CARS) algorithm and successive projections algorithm (SPA). These characteristic wavenumbers only accounted for 0.64% of the full wavenumbers. Following that, 18 characteristic wavenumbers were used to build linear and nonlinear determination models by PLS regression and extreme learning machine (ELM), respectively. The optimal model with RP = 0.963 and RMSEP = 0.137 was achieved by ELM algorithm. These results demonstrated that FT-IR spectroscopy with chemometrics could be used successfully to detect talcum powder in tea.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.1142/s1793545815500340
Application of near infrared spectroscopy in monitoring the moisture content in freeze-drying process of human coagulation factor VIII
  • Oct 27, 2015
  • Journal of Innovative Optical Health Sciences
  • Fei Wang + 7 more

As an important process analysis tool, near infrared spectroscopy (NIRS) has been widely used in process monitoring. In the present work, the feasibility of NIRS for monitoring the moisture content of human coagulation factor VIII (FVIII) in freeze-drying process was investigated. A partial least squares regression (PLS-R) model for moisture content determination was built with 88 samples. Different pre-processing methods were explored, and the best method found was standard normal variate (SNV) transformation combined with 1st derivation with Savitzky–Golay (SG) 15 point smoothing. Then, four different variable selection methods, including uninformative variable elimination (UVE), interval partial least squares regression (iPLS), competitive adaptive reweighted sampling (CARS) and manual method, were compared for eliminating irrelevant variables, and iPLS was chosen as the best variable selection method. The correlation coefficient (R), correlation coefficient of calibration set (R cal ), correlation coefficient of validation set (R val ), root mean square errors of cross-validation (RMSECV) and root mean square errors of prediction (RMSEP) of PLS model were 0.9284, 0.9463, 0.8890, 0.4986% and 0.4514%, respectively. The results showed that the model for moisture content determination has a wide range, good linearity, accuracy and precision. The developed approach was demonstrated to be a potential for monitoring the moisture content of FVIII in freeze-drying process.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.jfca.2022.104938
Application of near-infrared hyperspectral imaging coupled with chemometrics for rapid and non-destructive prediction of protein content in single chickpea seed
  • Sep 27, 2022
  • Journal of Food Composition and Analysis
  • Dhritiman Saha + 4 more

Application of near-infrared hyperspectral imaging coupled with chemometrics for rapid and non-destructive prediction of protein content in single chickpea seed

  • Research Article
  • Cite Count Icon 74
  • 10.1016/j.saa.2019.118005
Rapid detection of adulteration of minced beef using Vis/NIR reflectance spectroscopy with multivariate methods.
  • Jan 14, 2020
  • Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy
  • Shizhuang Weng + 7 more

Rapid detection of adulteration of minced beef using Vis/NIR reflectance spectroscopy with multivariate methods.

  • Research Article
  • Cite Count Icon 63
  • 10.1016/j.jpba.2008.09.054
Quantitative solid-state analysis of three solid forms of ranitidine hydrochloride in ternary mixtures using Raman spectroscopy and X-ray powder diffraction
  • Oct 17, 2008
  • Journal of Pharmaceutical and Biomedical Analysis
  • Norman Chieng + 4 more

Quantitative solid-state analysis of three solid forms of ranitidine hydrochloride in ternary mixtures using Raman spectroscopy and X-ray powder diffraction

  • Research Article
  • Cite Count Icon 62
  • 10.1016/j.infrared.2019.103034
Rapid prediction and visualization of moisture content in single cucumber (Cucumis sativus L.) seed using hyperspectral imaging technology
  • Sep 12, 2019
  • Infrared Physics & Technology
  • Yunfei Xu + 6 more

Rapid prediction and visualization of moisture content in single cucumber (Cucumis sativus L.) seed using hyperspectral imaging technology

  • Research Article
  • Cite Count Icon 25
  • 10.1016/j.biosystemseng.2019.06.010
Utilising near-infrared hyperspectral imaging to detect low-level peanut powder contamination of whole wheat flour
  • Jun 19, 2019
  • Biosystems Engineering
  • Xin Zhao + 5 more

Utilising near-infrared hyperspectral imaging to detect low-level peanut powder contamination of whole wheat flour

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-981-19-4884-8_32
LASSO Based Extreme Learning Machine for Spectral Multivariate Calibration of Complex Samples
  • Jan 1, 2022
  • Zizhen Zhao + 4 more

Extreme learning machine (ELM) has received increasing attention in multivariate calibration of complex samples due to its advantages of fast learning speed and good generalization ability. However, irrelevant variables in spectral matrix to target can interfere the quality of ELM modeling. Therefore, variable selection is required before multivariate calibration. In this study, least absolute shrinkage and selection operator (LASSO) combined with ELM (LASSO-ELM) is used for spectral quantitative analysis of complex samples. In the method, LASSO is firstly used to selected variables by shrinking regression coefficients of unselected variables to zero. The optimal model position s of LASSO is determined by Sp criterion. Then ELM model is built between the selected variables and analyzed target with the optimal activation function and hidden node number determined by the ratio of mean to standard deviation of correlation coefficients (MSR). Near infrared (NIR) spectra of tobacco lamina and ultraviolet (UV) spectra of fuel oil samples are used to evaluate the prediction performance of LASSO-ELM. Results show that only with tens of variables, LASSO-ELM achieves the lowest root mean square error of prediction (RMSEP) and highest correlation coefficient (R) compared with full-spectrum partial least squares (PLS) and ELM. Thus, LASSO-ELM is an effictive variable selection and multivariate calibration method for quanatitive analysis of complex samples.KeywordsLeast absolute shrinkage and selection operatorExtreme learning machineVariable selectionSpectral analysisQuantification

  • Research Article
  • 10.30560/as.v7n1p84
Study On the Detection of Dry Matter in Silage Corn Feed Based on Near Infrared Spectroscopy
  • Apr 13, 2025
  • Agricultural Science
  • Changfeng Shao + 1 more

This study explored the application of a portable near-infrared (NIR) spectrometer for analyzing silage corn feed quality, specifically focusing on developing a quantitative detection model for dry matter content. Spectral data were collected within the 855-1890 nm range using a portable NIR spectrometer, and the dataset was partitioned into calibration and prediction sets using the SPXY algorithm. An Extreme Learning Machine (ELM) model optimized by Particle Swarm Optimization (PSO) was employed for modeling. Five preprocessing methods were evaluated: Moving Average Filter (MAF), Savitzky-Golay Filter (SGF), Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV) transformation, and First Derivative (FD). To enhance model performance, feature wavelengths were selected using three methods: Bootstrap Soft Shrinkage (BOSS), Competitive Adaptive Reweighted Sampling (CARS), and Iterative Retained Information Variable (IRIV). The optimal model combining SNV preprocessing with BOSS feature selection achieved a prediction correlation coefficient () of 0.8708 and Root Mean Square Error of Prediction (RMSEP) of 0.6802. These results demonstrate the potential of portable NIR spectroscopy for rapid dry matter content determination in silage corn feed.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3390/s19143124
Hyperspectral Imaging for the Nondestructive Quality Assessment of the Firmness of Nanguo Pears Under Different Freezing/Thawing Conditions.
  • Jul 15, 2019
  • Sensors
  • Zhe Zhang + 6 more

Firmness changes in Nanguo pears under different freezing/thawing conditions have been characterized by hyperspectral imaging (HSI). Four different freezing/thawing conditions (the critical temperatures, numbers of cycles, holding time and cooling rates) were set in this experiment. Four different pretreatment methods were used: multivariate scattering correction (MSC), standard normal variate (SNV), Savitzky-Golay standard normal variate (S-G-SNV) and Savitzky-Golay multiplicative scattering correction (S-G-MSC). Combined with competitive adaptive reweighted sampling (CARS) to identify characteristic wavelengths, firmness prediction models of Nanguo pears under different freezing/thawing conditions were established by partial least squares (PLS) regression. The performance of the firmness model was analyzed quantitatively by the correlation coefficient (R), the root mean square error of calibration (RMSEC), the root mean square error of prediction (RMSEP) and the root mean square error of cross validation (RMSECV). The results showed that the MSC-PLS model has the highest accuracy at different cooling rates and holding times; the correlation coefficients of the calibration set (Rc) were 0.899 and 0.927, respectively, and the correlation coefficients of the validation set (Rp) were 0.911 and 0.948, respectively. The accuracy of the SNV-PLS model was the highest at different numbers of cycles, and the Rc and the Rp were 0.861 and 0.848, respectively. The RMSEC was 65.189, and the RMSEP was 65.404. The accuracy of the S-G-SNV-PLS model was the highest at different critical temperatures, with Rc and Rp values of 0.854 and 0.819, respectively, and RMSEC and RMSEP values of 74.567 and 79.158, respectively.

  • Research Article
  • Cite Count Icon 34
  • 10.1016/j.microc.2021.106642
Comparison of wavelength selected methods for improving of prediction performance of PLS model to determine aflatoxin B1 (AFB1) in wheat samples during storage
  • Jul 14, 2021
  • Microchemical Journal
  • Hui Jiang + 2 more

Comparison of wavelength selected methods for improving of prediction performance of PLS model to determine aflatoxin B1 (AFB1) in wheat samples during storage

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.microc.2024.110276
Sustainable chemometric methods boosted by Latin hypercube technique for quantifying the recently FDA-approved combination of bupivacaine and meloxicam in the presence of bupivacaine carcinogenic impurity: Comprehensive greenness, blueness, and whiteness assessments
  • Mar 2, 2024
  • Microchemical Journal
  • Michael K Halim + 2 more

Sustainable chemometric methods boosted by Latin hypercube technique for quantifying the recently FDA-approved combination of bupivacaine and meloxicam in the presence of bupivacaine carcinogenic impurity: Comprehensive greenness, blueness, and whiteness assessments

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 25
  • 10.1186/s13065-024-01158-7
Four chemometric models enhanced by Latin hypercube sampling design for quantification of anti-COVID drugs: sustainability profiling through multiple greenness, carbon footprint, blueness, and whiteness metrics
  • Mar 18, 2024
  • BMC Chemistry
  • Noha S Katamesh + 2 more

Montelukast sodium (MLK) and Levocetirizine dihydrochloride (LCZ) are widely prescribed medications with promising therapeutic potential against COVID-19. However, existing analytical methods for their quantification are unsustainable, relying on toxic solvents and expensive instrumentation. Herein, we pioneer a green, cost-effective chemometrics approach for MLK and LCZ analysis using UV spectroscopy and intelligent multivariate calibration. Following a multilevel multifactor experimental design, UV spectral data was acquired for 25 synthetic mixtures and modeled via classical least squares (CLS), principal component regression (PCR), partial least squares (PLS), and genetic algorithm-PLS (GA-PLS) techniques. Latin hypercube sampling (LHS) strategically constructed an optimal validation set of 13 mixtures for unbiased predictive performance assessment. Following optimization of the models regarding latent variables (LVs) and wavelength region, the optimum root mean square error of cross-validation (RMSECV) was attained at 2 LVs for the 210–400 nm spectral range (191 data points). The GA-PLS model demonstrated superb accuracy, with recovery percentages (R%) from 98 to 102% for both analytes, and root mean square error of calibration (RMSEC) and prediction (RMSEP) of (0.0943, 0.1872) and (0.1926, 0.1779) for MLK and LCZ, respectively, as well bias-corrected mean square error of prediction (BCMSEP) of -0.0029 and 0.0176, relative root mean square error of prediction (RRMSEP) reaching 0.7516 and 0.6585, and limits of detection (LOD) reaching 0.0813 and 0.2273 for MLK and LCZ respectively. Practical pharmaceutical sample analysis was successfully confirmed via standard additions. We further conducted pioneering multidimensional sustainability evaluations using state-of-the-art greenness, blueness, and whiteness tools. The method demonstrated favorable environmental metrics across all assessment tools. The obtained Green National Environmental Method Index (NEMI), and Complementary Green Analytical Procedure Index (ComplexGAPI) quadrants affirmed green analytical principles. Additionally, the method had a high Analytical Greenness Metric (AGREE) score (0.90) and a low carbon footprint (0.021), indicating environmental friendliness. We also applied blueness and whiteness assessments using the high Blue Applicability Grade Index (BAGI) and Red–Green–Blue 12 (RGB 12) algorithms. The high BAGI (90) and RGB 12 (90.8) scores confirmed the method's strong applicability, cost-effectiveness, and sustainability. This work puts forward an optimal, economically viable green chemistry paradigm for pharmaceutical quality control aligned with sustainable development goals.

  • Research Article
  • Cite Count Icon 2
  • 10.1080/01431161.2024.2402005
Predicting leaf nitrogen content of coffee trees using the canopy hyperspectral reflectance feature bands, vegetation index and machine learning
  • Oct 3, 2024
  • International Journal of Remote Sensing
  • Xiaogang Liu + 8 more

Leaf nitrogen content (LNC) is an essential indicator of crop nitrogen status. To rapidly and correctly estimate the LNC using hyperspectral remote sensing, the canopy hyperspectral reflectance of coffee trees treated with five levels of nitrogen fertilization in a greenhouse was obtained in this study. Five methods were used for hyperspectral data preprocessing, namely, Savitzky–Golay (SG) smoothing, a combination of SG and standard normal variate transformation (SG-SNV), a combination of SG and first-order derivative (SG-FD), a combination of SG and second-order derivative (SG-SD), and a combination of SG and multiplicative scatter correction (SG-MSC). Feature wavelengths were extracted using variables combination population analysis (VCPA), competitive adaptive reweighted sampling (CARS), and the combination (CARS-SPA) of CARS and successive projections algorithm (SPA). Vegetation indexes (VIs) were constructed and subjected to correlation analysis and variance inflation factor (VIF) analysis. Linear and nonlinear models including partial least squares regression (PLSR), back propagation neural network (BPNN), extreme learning machine (ELM), random forest regression (RFR), and support vector regression (SVR), were adopted to construct LNC retrieval models for coffee trees. The results indicated that SG-MSC could increase the signal-to-noise ratio of hyperspectral data well. The wavelengths selected by CARS-SPA were more relevant to LNC, and combined with ELM resulted in the best performance of LNC prediction (R2 P = 0.901, RMSEP = 0.825 g·kg−1, RPD = 3.229). Ten VIs were obtained through correlation analysis and VIF, and the VIs-based ELM prediction model also performed moderately well (R2 P = 0.814, RMSEP = 1.131 g·kg−1, RPD = 2.354). By comparing the coffee LNC prediction models established by different methods, two coffee LNC inversion models with better prediction accuracy were obtained, which provide a scientific basis for accurate diagnosis of coffee trees LNC, and are of great significance for optimizing the field management.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.