Reversed-phase (RP) liquid chromatography is an important tool for the characterization of materials and products in the pharmaceutical industry. Method development is still challenging in this application space, particularly when dealing with closely-related compounds. Models of chromatographic selectivity are useful for predicting which columns out of the hundreds that are available are likely to have very similar, or different, selectivity for the application at hand. The hydrophobic subtraction model (HSM1) has been widely employed for this purpose; the column database for this model currently stands at 750 columns. In previous work we explored a refinement of the original HSM1 (HSM2) and found that increasing the size of the dataset used to train the model dramatically reduced the number of gross errors in predictions of selectivity made using the model. In this paper we describe further work in this direction (HSM3), this time based on a much larger solute set (1014 solute/stationary phase combinations) containing selectivities for compounds covering a broader range of physicochemical properties compared to HSM1. The molecular weight range was doubled, and the range of the logarithm of the octanol/water partition coefficients was increased slightly. The number of active pharmaceutical ingredients and related synthetic intermediates and impurities was increased from four to 28, and ten pairs of closely related structures (e.g., geometric and cis-/trans- isomers) were included. The HSM3 model is based on retention measurements for 75 compounds using 13 RP stationary phases and a mobile phase of 40/60 acetonitrile/25 mM ammonium formate buffer at pH 3.2. This data-driven model produced predictions of ln α (chromatographic selectivity using ethylbenzene as the reference compound) with average absolute errors of approximately 0.033, which corresponds to errors in α of about 3 %. In some cases, the prediction of the trans-/cis- selectivities for positional and geometric isomers was relatively accurate, and the driving forces for the observed selectivity could be inferred by examination of the relative magnitudes of the terms in the HSM3 model. For some geometric isomer pairs the interactions mainly responsible for the observed selectivities could not be rationalized due to large uncertainties for particular terms in the model. This suggests that more work is needed in the future to explore other HSM-type models and continue expanding the training dataset in order to continue improving the predictive accuracy of these models. Additionally, we release with this paper a much larger data set (43,329 total retention measurements) at multiple mobile phase compositions, to enable other researchers to pursue their own lines of inquiry related to RP selectivity.
Read full abstract