Structural Analysis and Classification of Low-Molecular-Weight Hyaluronic Acid by Near-Infrared Spectroscopy: A Comparison between Traditional Machine Learning and Deep Learning.

Lei Nie,Hengchang Zang,Lian Li,Weilu Tian,Siling Huang,Liang Zhong,Lixuan Zang,Xueping Guo

doi:10.3390/molecules28020809

Lei Nie, Hengchang Zang + Show 6 more

Open Access

https://doi.org/10.3390/molecules28020809

Copy DOI

Journal: Molecules (Basel, Switzerland)	Publication Date: Jan 13, 2023
Citations: 2	License type: CC BY 4.0

Affiliation: Shandong University

Abstract

Confusing low-molecular-weight hyaluronic acid (LMWHA) from acid degradation and enzymatic hydrolysis (named LMWHA-A and LMWHA-E, respectively) will lead to health hazards and commercial risks. The purpose of this work is to analyze the structural differences between LMWHA-A and LMWHA-E, and then achieve a fast and accurate classification based on near-infrared (NIR) spectroscopy and machine learning. First, we combined nuclear magnetic resonance (NMR), Fourier transform infrared (FTIR) spectroscopy, two-dimensional correlated NIR spectroscopy (2DCOS), and aquaphotomics to analyze the structural differences between LMWHA-A and LMWHA-E. Second, we compared the dimensionality reduction methods including principal component analysis (PCA), kernel PCA (KPCA), and t-distributed stochastic neighbor embedding (t-SNE). Finally, the differences in classification effect of traditional machine learning methods including partial least squares-discriminant analysis (PLS-DA), support vector classification (SVC), and random forest (RF) as well as deep learning methods including one-dimensional convolutional neural network (1D-CNN) and long short-term memory (LSTM) were compared. The results showed that genetic algorithm (GA)-SVC and RF were the best performers in traditional machine learning, but their highest accuracy in the test dataset was 90%, while the accuracy of 1D-CNN and LSTM models in the training dataset and test dataset classification was 100%. The results of this study show that compared with traditional machine learning, the deep learning models were better for the classification of LMWHA-A and LMWHA-E. Our research provides a new methodological reference for the rapid and accurate classification of biological macromolecules.

Full Text