Abstract
Intrinsic solubility is a critical property in pharmaceutical industry that impacts in-vivo bioavailability of small molecule drugs. However, solubility prediction with Artificial Intelligence(AI) are facing insufficient data, poor data quality, and no unified measurements for AI and physics-based approaches. We collect 7 aqueous solubility datasets, and present a dataset curation workflow. Evaluating the curated data with two expanded deep learning methods, improved RMSE scores on all curated thermodynamic datasets are observed. We also compare expanded Chemprop enhanced with curated data and state-of-art physics-based approach using pearson and spearman correlation coefficients. A similar performance on pearson with 0.930 and spearman with 0.947 from expanded Chemprop is achieved. A steadily improved pearson and spearman values with increasing data points are also illustrated. Besides that, the computation advantage of AI models enables quick evaluation of a large set of molecules during the hit identification or lead optimization stages, which helps further decision making within the time cycle at drug discovery stage.
Highlights
Aqueous solubility is one of the critical factors defining the bio-availability of orally administrated drugs
The disparate statistical measurement and high quality datasets are the main obstacles to making an objective comparison between deep learning and Quantum Mechanics-Quantitative Structure Property Relationships (QM-QSPR) approaches, in terms of solubility prediction
This dataset includes four pharmaceutical series of 48 molecules, and none are contained in the 7 collected datasets. pearson and spearman’s rank-order correlation coefficients are used to evaluate the performances of the deep learning and QM-QSPR approaches
Summary
Aqueous solubility is one of the critical factors defining the bio-availability of orally administrated drugs. Over 75% of oral drug development candidates have a low solubility based on the Bio-pharmaceutics Classification System (BCS)[01,2] To tackle this challenge, researchers are focusing on drug solubility improvements with both physics-based Quantum Mechanics-Quantitative Structure Property Relationships (QM-QSPR) approaches[3–6] and data-driven artificial intelligence (AI) methods[7–11]. The development of QM-QSPR approaches provides a large number of computational methods for aqueous solubility prediction starting from a molecular structure[3–6]. The majority of these methods try to explore fundamental physics-based rules with a sublimation thermodynamic cycle solubility approach[2,12] on crystalline drug-like molecules. This approach is an interplay between crystal packing and molecular hydration free energy contributions[12–15]. Guiding lead optimization[2] relies on crystal structure prediction calculations[19], which may require several days on a powerful cloud infrastructure consisting of millions of CPU cores
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.