Abstract

Quantitative Structure-Retention Relationships offer a valuable tool for de-risking chromatographic methods in relation to newly formed or hypothetical compounds, arising from synthetic processes or formulation activities. They can also be used to identify optimal separation conditions, or in support of structural elucidation. In this contribution, we provide a systematic study of the relationship between the accuracy of the retention model, the size of the training set and its structural similarity to the predicted compound. We compare structural similarity expressed either on a fingerprint basis (e.g., Tanimoto index), or by Euclidean distance calculated from of subset of molecular descriptors. The results presented indicate that accurate and predictive models can be built from a small dataset containing as few as 25 compounds, provided that the training set is structurally similar to the test compound. When the training set contains compounds selected by minimizing the Euclidean distance calculated from 3 descriptors most correlated with the retention time, root mean square error of 0.48 min and correlation coefficient of 0.9464 were observed for the test sets of 104 compounds. Moreover, these models meet the Tropsha predictivity criteria. These findings potentially bring the prediction of retention times within the practical reach of pharmaceutical analysts involved in chromatographic method development. We also present an optimisation approach to select algorithm settings in order to minimize the prediction error and ensure model predictivity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.