Abstract

The estimation of gas chromatographic retention indices based on compounds structures is an importantproblem. Predicted retention indices can be used in a mass spectral library search for the identificationof unknowns. Various machine learning methods are used for this task, but methods based on decisiontrees, in particular gradient boosting, are not used widely. The aim of this work is to examine the usability ofthis method for the retention index prediction. 177 molecular descriptors computed with Chemistry Development Kit are used as the input representation of a molecule. Random subsets of the whole NIST 17 database are used as training, test and validation sets. 8000 trees with 6 leaves each are used. A neural network with one hidden layer (90 hidden nodes) is used for the comparison. The same data sets and the set of descriptors are used for the neural network and gradient boosting. The model based on gradient boosting outperforms the neural network with one hidden layer for subsets of NIST 17 and for the set of essential oils.The performance of this model is comparable or better than performance of other modern retention prediction models. The average relative deviation is ~3.0%, the median relative deviation is ~1.7% for subsets of NIST 17. The median absolute deviation is ~34 retention index units. Only non-polar liquid stationary phases (such as polydimethylsiloxane, 5% phenyl 95% polydimethylsiloxane, squalane) are considered. Errors obtained with different machine learning algorithms and with the same representation of the molecule strongly correlate with each other.

Highlights

  • Gas chromatography is one of the most widely used methods of separation and chemical analysis

  • Hyphenated method gas chromatography – mass spectrometry is widely used for untargeted analysis, in particular for metabolomics and for the environmental analysis

  • Number of leaves was varied in range 2-10, number of trees in range 1000-10000, shrinkage parameter and sampling fraction in range 0-1

Read more

Summary

Introduction

Gas chromatography is one of the most widely used methods of separation and chemical analysis. Hyphenated method gas chromatography – mass spectrometry is widely used for untargeted analysis, in particular for metabolomics and for the environmental analysis. The retention time highly depends on conditions of the chromatographic experiment. The retention index (RI) depends only on the chemical nature of a molecule and stationary phase. The reference retention index is available for less than 100.000 of chemical compounds in public databases [3]. It is several times less than a number of compounds for which the mass spectral information is available and almost thousand times less than a number of all known compounds

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call