Abstract

Medicinal chemistry and drug design efforts can be assisted by machine learning (ML) models that relate the molecular structure to compound properties. Such quantitative structure-property relationship models are generally trained on large data sets that include diverse chemical series (global models). In the pharmaceutical industry, these ML global models are available across discovery projects as an "out-of-the-box" solution to assist in drug design, synthesis prioritization, and experiment selection. However, drug discovery projects typically focus on confined parts of the chemical space (e.g., chemical series), where global models might not be applicable. Local ML models are sometimes generated to focus on specific projects or series. Herein, ML-based global models, local models, and hybrid global-local strategies were benchmarked. Analyses were done for more than 300 drug discovery projects at Novartis and ten absorption, distribution, metabolism, and excretion (ADME) assays. In this work, hybrid global-local strategies based on transfer learning approaches were proposed to leverage both historical ADME data (global) and project-specific data (local) to adapt model predictions. Fine-tuning a pretrained global ML model (used for weights' initialization, WI) was the top-performing method. Average improvements of mean absolute errors across all assays were 16% and 27% compared with global and local models, respectively. Interestingly, when the effect of training set size was analyzed, WI fine-tuning was found to be successful even in low-data scenarios (e.g., ∼10 molecules per project). Taken together, this work highlights the potential of domain adaptation in the field of molecular property predictions to refine existing pretrained models on a new compound data distribution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call