Abstract

Retention indices are values that characterize the retention of a compound in gas chromatography. In practice, retention indices are often assumed to depend only on the structure of the molecule and the type of the stationary phase, but this approximation is incorrect. This study is devoted to studying the dependence of retention indices on the column heating rate in the linear temperature programming mode, using a large and diverse data set. In the NIST 20 database, most data records are recorded in this mode. For stationary phases based on poly(5%-diphenyl-95%-dimethyl)siloxane (5%-phenyl-PDMS), there is a high proportion of records with heating rates of 10-15 K/min. In practice, such a high heating rate is rarely used and the use of such data may cause errors. A search was made for groups of records that were taken from the same primary source, recorded for the same compound and the same stationary phase, but differing in a heating rate. For each of these groups, the value D, the angular coefficient (slope) of the dependence of the retention index on the heating rate, was calculated. This value can take both positive and negative values. The highest values and the greatest variation of D values are observed for polar stationary phases, but further consideration was performed for 5%-phenyl-PDMS due to its greater practical significance. For these stationary phases, the highest D values are observed for aromatic and polyaromatic molecules; oxygen-containing compounds, on the contrary, exhibit lower D values. Negative D values are observed for many trimethylsilyl derivatives. A data set of D values for 756 molecules was selected and published online. There is almost no correlation between D and the retention index, lipophilicity factor logP, and molecular weight. Significant correlations with the number of cycles, the number of rotatable bonds, and the number of aromatic atoms were observed. Linear equations quantitatively relating the molecular descriptors to the D value were constructed. A number of cycles and halogen atoms were shown to contribute positively to the D value, while a number of oxygen atoms and bonds subject to internal rotation contributed negatively. The strong influence of the values related to the conformational rigidity of molecules and the weak influence of polarity allow us to suppose that the entropic factor has a key influence on the D value. A simple empirical linear equation for estimating the value of D is derived and presented in this study. Several machine learning methods for predicting D are compared. The best results are shown by gradient boosting and a random forest. However, the random forest does not achieve high accuracy in predicting the retention indices themselves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call