MRlogP: Transfer Learning Enables Accurate logP Prediction Using Small Experimental Training Datasets

Yan-Kai Chen,Steven Shave,Manfred Auer

doi:10.3390/pr9112029

Yan-Kai Chen, Steven Shave + Show 1 more

Open Access

PDF Available

https://doi.org/10.3390/pr9112029

Copy DOI

Export

Save

Cite

Journal: Processes	Publication Date: Nov 13, 2021
Citations: 4	License type: CC BY 4.0

Affiliation: University of Edinburgh

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Small molecule lipophilicity is often included in generalized rules for medicinal chemistry. These rules aim to reduce time, effort, costs, and attrition rates in drug discovery, allowing the rejection or prioritization of compounds without the need for synthesis and testing. The availability of high quality, abundant training data for machine learning methods can be a major limiting factor in building effective property predictors. We utilize transfer learning techniques to get around this problem, first learning on a large amount of low accuracy predicted logP values before finally tuning our model using a small, accurate dataset of 244 druglike compounds to create MRlogP, a neural network-based predictor of logP capable of outperforming state of the art freely available logP prediction methods for druglike small molecules. MRlogP achieves an average root mean squared error of 0.988 and 0.715 against druglike molecules from Reaxys and PHYSPROP. We have made the trained neural network predictor and all associated code for descriptor generation freely available. In addition, MRlogP may be used online via a web interface.

Highlights

Common rulesets used in drug discovery and medicinal chemistry, such as Lipinski’s
Methods of logP prediction can be broadly placed into two classes, substructure and whole molecule approaches [9], each class separated by one fundamental assumption, that a molecule’s lipophilicity is additive, or that it is not, and it is more complex than a sum of discrete substructure contributions
Many freely-available logP predictors are available, including substructure-based methods, such as ALOGP [10], XLOGP3 [11], and JPlogP [12], along with programs using whole molecule methods such as ALOGPS [13], MLOGP [14], VEGA [15], and UFZ-LSER [16]. Both ALOGP [10] and XLOGP3 [13] adopted an atom-additive method for logP prediction, whereas XLOGP3 [13] utilizes larger molecular fragments, applying further correction factors to deal with intramolecular interactions

Summary

Introduction

Common rulesets used in drug discovery and medicinal chemistry, such as Lipinski’s “rule of five” [1,2] and Oprea’s “rule of three” [3,4], aggregate properties of a molecule to predict a further property such as in-vivo absorption or how ‘lead like’ and suitable for medicinal chemistry efforts a molecule is. Compound lipophilicity is commonly expressed as the log of the partition coefficient of compound distribution in an octanol/water system and commonly referred to as logP. Assessing the partition coefficient on a log scale gives rise to hydrophobic compounds having a positive logP and hydrophilic compounds having a negative logP. Along with aggregation of this value with other properties as input to predictors, it is used on its own to perform in-vivo localization [6] and barrier permeability predictions [7]. Hann and Keserü assessed approved drugs and identified logP along with molecular weight as strong predictors of a compound achieving approved drug status [8]. Experimental determination of logP can be performed using a range of methods.

Objectives

Methods

Results

Conclusion