Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning

Dmitriy D Matyushin,Aleksey K Buryak

doi:10.1109/access.2020.3045047

Dmitriy D Matyushin, Aleksey K Buryak

Open Access

https://doi.org/10.1109/access.2020.3045047

Copy DOI

Abstract

Gas chromatography is a widely used method in analytical chemistry and metabolomics. Using gas chromatography, vaporizable compounds can be separated for their further identification. Retention indices are standardized values that depend only on a chemical structure of a compound and on a stationary phase and characterize the retention of a compound in a chromatographic system. Retention index prediction is an important task because databases contain experimental values for a small fraction of all possible molecules, while this information is usable for untargeted analysis. In this work, we consider four machine learning models for retention index prediction: 1D and 2D convolutional neural networks, deep residual multilayer perceptron, and gradient boosting. String representation of the molecule, 2D representation of the chemical structure, molecular descriptors and fingerprints, and molecular descriptors are used as inputs of these four models, respectively, along with information about the stationary phase. The first and third models show the best performance, while the other two perform slightly worse. The models predict retention index values for various standard and semi-standard non-polar stationary phases. Further improvement in performance was achieved using a linear model that uses the results of four previous models as inputs (model stacking). The models were tested using various diverse data sets: flavor compounds, essential oils, metabolomics-related compounds. Achieved accuracy: median absolute and percentage errors – 6–40 units and 0.8-2.2%. Accuracy depends on a test data set. The stacking model outperforms previously reported approaches for all test data sets. Parameters of a pre-trained model and some source code are provided.

Highlights

Gas chromatography (GC) is an important method for separating compounds and chemical analysis and is widely used in metabolomics, environmental analysis and other fields
We tried multiple setups for single-input multi-layer perceptron with two inputs (MLP): we varied the number of layers in the range 2-5, nodes per layer, activation functions, regularization methods (L2, L1, dropout), residual connections
In all cases that we considered, single-input MLP performs worse than gradient boosting using the same data set and using the same feature set

Summary

Introduction

Gas chromatography (GC) is an important method for separating compounds and chemical analysis and is widely used in metabolomics, environmental analysis and other fields. Mixtures of vaporizable compounds can be efficiently and rapidly separated for their further detection and identification using electron ionization mass spectrometry (MS) or other methods. A mixture of vapors of the compounds to be separated moves with a stream of gas (mobile phase) along the surface of a non-volatile liquid (stationary phase). Separation is achieved due to different volatility and affinity of different compounds to the stationary phase. This leads to the fact that different compounds are retained in the chromatographic system for a different periods of time.

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 87	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Prediction of Kovats Retention Indices of Some Aliphatic Aldehydes and Ketones on Some Stationary Phases at Different Temperatures Using Artificial Neural Network
E Konoz ... M H Fatemi
Journal of Chromatographic Science | VOL. 46
E Konoz, et. al.E Konoz ... M H Fatemi
01 May 2008
Journal of Chromatographic Science | VOL. 46

Large-scale statistical study of the dependence of retention index on heating rate in temperature-programmed gas chromatography
Dmitriy D Matyushin ... Anastasia Yu Sholokhova
Journal of Chromatography A | VOL. 1732
Dmitriy D Matyushin, et. al.Dmitriy D Matyushin ... Anastasia Yu Sholokhova
02 Aug 2024
Journal of Chromatography A | VOL. 1732

DeepReI: Deep learning-based gas chromatographic retention index predictor
Tomáš Vrzal ... Jana Olšovská
Analytica Chimica Acta | VOL. 1147
Tomáš Vrzal, et. al.Tomáš Vrzal ... Jana Olšovská
29 Dec 2020
Analytica Chimica Acta | VOL. 1147

Ready-to-use Models Built Using a Diverse Set of 266 Aroma Compounds for the Estimation of Gas Chromatographic Retention Indices for the 50%-Cyanopropylphenyl-50%-Dimethylpolysiloxane Stationary Phase.
Anastasia Yu Sholokhova ... Dmitriy D Matyushin
Journal of separation science | VOL. 47
Anastasia Yu Sholokhova, et. al.Anastasia Yu Sholokhova ... Dmitriy D Matyushin
01 Nov 2024
Journal of separation science | VOL. 47

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gas Chromatographic Retention Index Prediction Using Multimodal Machine Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access