Abstract

In this study, we design and carry out a survey, asking human experts to predict the aqueous solubility of druglike organic compounds. We investigate whether these experts, drawn largely from the pharmaceutical industry and academia, can match or exceed the predictive power of algorithms. Alongside this, we implement 10 typical machine learning algorithms on the same dataset. The best algorithm, a variety of neural network known as a multi-layer perceptron, gave an RMSE of 0.985 log S units and an R2 of 0.706. We would not have predicted the relative success of this particular algorithm in advance. We found that the best individual human predictor generated an almost identical prediction quality with an RMSE of 0.942 log S units and an R2 of 0.723. The collection of algorithms contained a higher proportion of reasonably good predictors, nine out of ten compared with around half of the humans. We found that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median generated excellent predictivity. While our consensus human predictor achieved very slightly better headline figures on various statistical measures, the difference between it and the consensus machine learning predictor was both small and statistically insignificant. We conclude that human experts can predict the aqueous solubility of druglike molecules essentially equally well as machine learning algorithms. We find that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median is a powerful way of benefitting from the wisdom of crowds.

Highlights

  • Introduction to methodology and encoding rulesJ Chem Inf Comput Sci 28(1):31–3647

  • We decided in advance that our consensus machine learning predictor would be based on the median solubility predicted for each molecule amongst the ten algorithms

  • We assessed each machine learning method in terms of the root mean squared error (RMSE), average absolute error (AAE), coefficient of determination which is the square of the Pearson correlation coefficient ­(R2), Spearman rank correlation coefficient (ρ), and number of correct predictions within a margin of one log S unit (NC)

Read more

Summary

Introduction

Introduction to methodology and encoding rulesJ Chem Inf Comput Sci 28(1):31–3647. O’Boyle NM (2012) Towards a universal SMILES representation—a standard method to generate canonical SMILES based on the InChI. Solubility is the property of a chemical solute dissolving in a solvent to form a homogeneous system [1]. Water solubility is one of the key requirements of drugs, ensuring that they can be absorbed through the stomach lining and small intestine, eventually passing through the liver into the bloodstream. This means that low solubility is linked with poor bioavailability [2]. Another typical requirement of a drug is delivery in tablet form, again adequate solubility is needed.

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.