Can human experts predict solubility better than computers?

Samuel Boobier,Anne Osbourn,John B O Mitchell

doi:10.1186/s13321-017-0250-y

Samuel Boobier, Anne Osbourn + Show 1 more

Open Access

https://doi.org/10.1186/s13321-017-0250-y

Copy DOI

Abstract

In this study, we design and carry out a survey, asking human experts to predict the aqueous solubility of druglike organic compounds. We investigate whether these experts, drawn largely from the pharmaceutical industry and academia, can match or exceed the predictive power of algorithms. Alongside this, we implement 10 typical machine learning algorithms on the same dataset. The best algorithm, a variety of neural network known as a multi-layer perceptron, gave an RMSE of 0.985 log S units and an R2 of 0.706. We would not have predicted the relative success of this particular algorithm in advance. We found that the best individual human predictor generated an almost identical prediction quality with an RMSE of 0.942 log S units and an R2 of 0.723. The collection of algorithms contained a higher proportion of reasonably good predictors, nine out of ten compared with around half of the humans. We found that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median generated excellent predictivity. While our consensus human predictor achieved very slightly better headline figures on various statistical measures, the difference between it and the consensus machine learning predictor was both small and statistically insignificant. We conclude that human experts can predict the aqueous solubility of druglike molecules essentially equally well as machine learning algorithms. We find that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median is a powerful way of benefitting from the wisdom of crowds.

Highlights

Introduction to methodology and encoding rulesJ Chem Inf Comput Sci 28(1):31–3647
We decided in advance that our consensus machine learning predictor would be based on the median solubility predicted for each molecule amongst the ten algorithms
We assessed each machine learning method in terms of the root mean squared error (RMSE), average absolute error (AAE), coefficient of determination which is the square of the Pearson correlation coefficient (R2), Spearman rank correlation coefficient (ρ), and number of correct predictions within a margin of one log S unit (NC)

Summary

Introduction

Introduction to methodology and encoding rulesJ Chem Inf Comput Sci 28(1):31–3647. O’Boyle NM (2012) Towards a universal SMILES representation—a standard method to generate canonical SMILES based on the InChI. Solubility is the property of a chemical solute dissolving in a solvent to form a homogeneous system [1]. Water solubility is one of the key requirements of drugs, ensuring that they can be absorbed through the stomach lining and small intestine, eventually passing through the liver into the bloodstream. This means that low solubility is linked with poor bioavailability [2]. Another typical requirement of a drug is delivery in tablet form, again adequate solubility is needed.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Dec 1, 2017
Citations: 51	License type: open-access

R Discovery Prime

R Discovery Prime

Can human experts predict solubility better than computers?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Artificial intelligence: Friend or foe?
Anusch Yazdani ... Sam Costa
Australian and New Zealand Journal of Obstetrics and Gynaecology | VOL. 63
Anusch Yazdani, et. al.Anusch Yazdani ... Sam Costa
01 Apr 2023
Australian and New Zealand Journal of Obstetrics and Gynaecology | VOL. 63

Pushing the limits of solubility prediction via quality-oriented data selection.
Murat Cihan Sorkun ... Süleyman Er
iScience | VOL. 24
Murat Cihan Sorkun, et. al.Murat Cihan Sorkun ... Süleyman Er
17 Dec 2020
iScience | VOL. 24

Comparison of Classification Success Rates of Different Machine Learning Algorithms in the Diagnosis of Breast Cancer.
Irem Ozcan ... Ali Cetinkaya
Asian Pacific Journal of Cancer Prevention | VOL. 23
Irem Ozcan, et. al.Irem Ozcan ... Ali Cetinkaya
01 Oct 2022
Asian Pacific Journal of Cancer Prevention | VOL. 23

BioAutoML: Democratizing Machine Learning in Life Sciences
Robson Parmezan Bonidia ... Carvalho
-
Robson Parmezan Bonidia, et. al.Robson Parmezan Bonidia ... Carvalho
25 Jun 2024
25 Jun 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Can human experts predict solubility better than computers?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics