Abstract
Small molecule lipophilicity is often included in generalized rules for medicinal chemistry. These rules aim to reduce time, effort, costs, and attrition rates in drug discovery, allowing the rejection or prioritization of compounds without the need for synthesis and testing. The availability of high quality, abundant training data for machine learning methods can be a major limiting factor in building effective property predictors. We utilize transfer learning techniques to get around this problem, first learning on a large amount of low accuracy predicted logP values before finally tuning our model using a small, accurate dataset of 244 druglike compounds to create MRlogP, a neural network-based predictor of logP capable of outperforming state of the art freely available logP prediction methods for druglike small molecules. MRlogP achieves an average root mean squared error of 0.988 and 0.715 against druglike molecules from Reaxys and PHYSPROP. We have made the trained neural network predictor and all associated code for descriptor generation freely available. In addition, MRlogP may be used online via a web interface.
Highlights
Common rulesets used in drug discovery and medicinal chemistry, such as Lipinski’s
Methods of logP prediction can be broadly placed into two classes, substructure and whole molecule approaches [9], each class separated by one fundamental assumption, that a molecule’s lipophilicity is additive, or that it is not, and it is more complex than a sum of discrete substructure contributions
Many freely-available logP predictors are available, including substructure-based methods, such as ALOGP [10], XLOGP3 [11], and JPlogP [12], along with programs using whole molecule methods such as ALOGPS [13], MLOGP [14], VEGA [15], and UFZ-LSER [16]. Both ALOGP [10] and XLOGP3 [13] adopted an atom-additive method for logP prediction, whereas XLOGP3 [13] utilizes larger molecular fragments, applying further correction factors to deal with intramolecular interactions
Summary
Common rulesets used in drug discovery and medicinal chemistry, such as Lipinski’s “rule of five” [1,2] and Oprea’s “rule of three” [3,4], aggregate properties of a molecule to predict a further property such as in-vivo absorption or how ‘lead like’ and suitable for medicinal chemistry efforts a molecule is. Compound lipophilicity is commonly expressed as the log of the partition coefficient of compound distribution in an octanol/water system and commonly referred to as logP. Assessing the partition coefficient on a log scale gives rise to hydrophobic compounds having a positive logP and hydrophilic compounds having a negative logP. Along with aggregation of this value with other properties as input to predictors, it is used on its own to perform in-vivo localization [6] and barrier permeability predictions [7]. Hann and Keserü assessed approved drugs and identified logP along with molecular weight as strong predictors of a compound achieving approved drug status [8]. Experimental determination of logP can be performed using a range of methods.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have