Abstract

The prediction of solubility of drugs usually calls on the use of several open-source/commercially-available computer programs in the various calculation steps. Popular statistics to indicate the strength of the prediction model include the coefficient of determination (r2), Pearson’s linear correlation coefficient (rPearson), and the root-mean-square error (RMSE), among many others. When a program calculates these statistics, slightly different definitions may be used. This commentary briefly reviews the definitions of three types of r2 and RMSE statistics (model validation, bias compensation, and Pearson) and how systematic errors due to shortcomings in solubility prediction models can be differently indicated by the choice of statistical indices. The indices we have employed in recently published papers on the prediction of solubility of druglike molecules were unclear, especially in cases of drugs from ‘beyond the Rule of 5’ chemical space, as simple prediction models showed distinctive ‘bias-tilt’ systematic type scatter.

Highlights

  • The ubiquitous coefficient of determination (r2) and root-mean-square error (RMSE) are statistics which enumerate the strength of a physical property prediction model [1,2,3,4]

  • The commentary confines the discussion to statistics derived by linear regression of scatter plots of log S0Obs vs. log S0Calc, with observed values treated as dependent variables (y-axis) and calculated values treated as independent variables (x-axis) [3]

  • The General Solubility Equation (GSE) and the Abraham Solvation Equation (ABSOLV) models used to predict the solubility of drugs from ‘beyond the Rule of 5’ chemical space showed (e.g., Figs. 4b, 5b in Ref. [8]) distinctive bias-tilt type scatter, with different degrees of systematic aberrations introduced by the limitations in the models when applied to such large molecules

Read more

Summary

Introduction

The ubiquitous coefficient of determination (r2) and root-mean-square error (RMSE) are statistics which enumerate the strength of a physical property prediction model [1,2,3,4]. Their estimated values depend conditionally on random errors in the observed data and on systematic errors generated as a result of limitations in a particular prediction model. The commentary confines the discussion to statistics derived by linear regression of scatter plots of log S0Obs vs log S0Calc (log S0 = logarithm of aqueous intrinsic solubility), with observed values treated as dependent variables (y-axis) and calculated values treated as independent variables (x-axis) [3]. Whether r2 or RMSE is a better statistic to use is beyond the scope of this commentary

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.