Retention indices are widely used in gas chromatography and chromatography-mass spectrometry as an additional factor in tentative identification (along with the mass spectrum). Reference data on retention indices are available only for a limited number of molecules; in other cases, retention indices predicted by mathematical models can be used. Models for predicting retention indices developed prior to 2018 mostly have either very low accuracy or a very narrow domain of applicability. However, in recent years, starting from 2018, the situation has begun to change: the use of deep neural networks and large training sets (mainly different versions of the NIST database) made it possible to build both accurate and general-purpose models for predicting gas chromatographic retention indices, with the accuracy increasing over time. In recent years, at least 7 deep learning-based models for predicting gas chromatographic retention indices have been released in the public domain. The authors always declare that their model is more accurate than previous models, however, in all cases, there are no independent measurements of accuracy. This work aimed to objectively and critically compare retention index prediction models and corresponding software using the same retention data set that was guaranteed not to intersect with the training sets used by the authors of the models. Seven models and corresponding software were considered, including MetExpert (2018), DeepReI (2021), SVEKLA (2021), and AIRI (2024). It was shown that for the non-polar stationary phase (ZB-5MS), the accuracy of the newest models gradually approaches the accuracy of the reference libraries and is quite high. The newer models are indeed more accurate than the older ones. At the same time, for the polar stationary phase (SH-Stabilwax), the accuracy (independent data set) is very low and significantly lower than that stated in the original papers devoted to the predictive models. For users with limited experience, the process of compiling and running software can be challenging, particularly when attempting to do so several years after publication. This is often due to incompatibility issues between model files and newer versions of the frameworks. It is not uncommon for software authors to discontinue any support of the software after an article has been published in a journal.
Read full abstract