Localised quantitative structure–retention relationship modelling for rapid method development in reversed-phase high performance liquid chromatography

Soo Hyun Park,Mauro De Pra,Paul R Haddad,Sylvia Grosse,Christopher A Pohl,Frank Steiner

doi:10.1016/j.chroma.2019.460508

Abstract

Quantitative structure–retention relationships (QSRR) predicting the values of solute “hydrophobicity” coefficient η′ in the approximate hydrophobic subtraction model (HSM) can be used to predict retention times of compounds on numerous reversed-phase (RP) columns, provided that column parameters on the corresponding stationary phases are available. In the present study, we propose a new dual clustering-based localised QSRR approach, combining P-ratio clustering (where P is the octanol–water partition coefficient) with second dominant interaction (SDI)-based clustering, to produce predictive models with an acceptable level of prediction accuracy for in silico column scoping in RP method development. QSRR models for η′ values were derived for 49 compounds out of 63 in a dataset extracted from the literature, where retention data were measured under one isocratic mobile phase condition (i.e., acetonitrile-water, 50:50 [v/v]). These models gave a predictive squared correlation coefficient Qext(F2)2 of 0.83 and a root mean square error of prediction (RMSEP) of 0.14. For the modelling, a genetic algorithm-partial least square regression (GA-PLS) approach was performed using the η′ values and their relevant molecular descriptors. The corresponding retention times were predicted by applying the predicted η′ values of the models and the stationary phase “hydrophobicity” parameter H values for the corresponding columns to the approximate HSM, resulting in excellent accuracy and predictability (Qext(F2)2 of 0.90 and RMSEP of 0.72 min). The established QSRR approach was experimentally verified for six Thermo Scientific columns (Acclaim™ 120 C18, Acclaim Polar Advantage, Acclaim Polar Advantage II, Accucore™ aQ, Accucore Phenyl-X, and Hypersil Gold C18 columns) using two types of datasets. The first dataset consisted of eight model compounds extracted from the original dataset and retention time predictions for those compounds were then evaluated on the above columns. The result showed good agreement between predicted and observed retention times with an acceptable error in retention time predictions (slope of 0.97, Qext(F2)2 of 0.95, a mean absolute error (MAE) of 0.43 min and RMSEP of 0.61 min). The second dataset included eight test compounds not included in the original dataset, which were all classified into the η′ cluster by applying a Tanimoto similarity (TS) threshold of 0.7. Similarly, predicted retention times of the test compounds were compared with their corresponding observed retention times, resulting in acceptable retention time predictions with the slope of 0.99, Qext(F2)2 of 0.93 and RMSEP of 0.52 min. Comparisons of resolution values between columns were utilised to select the most suitable columns for separations of the compounds in the respective test sets. Actual chromatograms obtained on the chosen columns showed the feasibility for effective column scoping without experimentation on numerous RP stationary phases available in the USP website, based on the predicted resolution values.

Full Text