Abstract

Rapid yet accurate pKa prediction for druglike molecules is a key challenge in computational chemistry. This study uses PM6-DH+/COSMO, PM6/COSMO, PM7/COSMO, PM3/COSMO, AM1/COSMO, PM3/SMD, AM1/SMD, and DFTB3/SMD to predict the pKa values of 53 amine groups in 48 druglike compounds. The approach uses an isodesmic reaction where the pKa value is computed relative to a chemically related reference compound for which the pKa value has been measured experimentally or estimated using a standard empirical approach. The AM1- and PM3-based methods perform best with RMSE values of 1.4-1.6 pH units that have uncertainties of ±0.2-0.3 pH units, which make them statistically equivalent. However, for all but PM3/SMD and AM1/SMD the RMSEs are dominated by a single outlier, cefadroxil, caused by proton transfer in the zwitterionic protonation state. If this outlier is removed, the RMSE values for PM3/COSMO and AM1/COSMO drop to 1.0 ± 0.2 and 1.1 ± 0.3, whereas PM3/SMD and AM1/SMD remain at 1.5 ± 0.3 and 1.6 ± 0.3/0.4 pH units, making the COSMO-based predictions statistically better than the SMD-based predictions. For pKa calculations where a zwitterionic state is not involved or proton transfer in a zwitterionic state is not observed, PM3/COSMO or AM1/COSMO is the best pKa prediction method; otherwise PM3/SMD or AM1/SMD should be used. Thus, fast and relatively accurate pKa prediction for 100-1000s of druglike amines is feasible with the current setup and relatively modest computational resources.

Highlights

  • One of the central practical challenges to be met when performing calculations on many organic molecules in aqueous solution is selecting the correct protonation state at a given pH

  • PM3- and AM1-based methods Table 1 lists the predicted pKa values, Figure 2 shows a plot of the errors, and Table 2 lists the root-meansquare-error (RMSE) and maximum absolute error for each method

  • The null model pKa ≈ pKraef has an RMSE of 1.8 ± 0.3/0.4

Read more

Summary

INTRODUCTION

One of the central practical challenges to be met when performing calculations on many organic molecules in aqueous solution is selecting the correct protonation state at a given pH. Settimo et al (2013) have recently shown that the empirical methods can fail for some amines, which represent a large fraction of drugs currently on the market or in development This problem could make these methods difficult to apply to computational exploration of chemical space (Rupakheti et al, 2016; Gomez-Bombarelli et al, 2016) where molecules with completely new chemical substructures are likely to be encountered. When applied to larger molecules (Eckert and Klamt, 2005; Klicicet al., 2002), some degree of empiricism is usually introduced to increase the accuracy of the predictions but these parameters tend to be much more transferable because of the underlying QM-model These QM-based methods are computationally quite demanding and cannot be routinely applied to the very large sets of molecules typically encountered in high throughput screening. In addition I test more semiempirical methods than in the previous study

COMPUTATIONAL METHODOLOGY
RESULTS AND DISCUSSION
SUMMARY AND OUTLOOK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call