Ligandability and druggability assessment via machine learning

Francesco Di Palma,Carlo Abate,Sergio Decherchi,Andrea Cavalli

doi:10.1002/wcms.1676

Abstract

AbstractDrug discovery is a daunting and failure‐prone task. A critical process in this research field is represented by the biological target and pocket identification steps as they heavily determine the subsequent efforts in selecting a putative ligand, most often a small molecule. Finding “ligandable” pockets, namely protein cavities that may accept a drug‐like binder is instrumental to the more general and drug discovery oriented “druggability” estimation process. While high‐throughput experimental techniques exist to identify putative binding sites other than the orthosteric one, these techniques are relatively expensive and not so commonly available in labs. In this regard, computational means of detecting ligandable pockets are advisable for their inexpensiveness and speed. These methods can become, in principle, particularly predictive when supported by machine learning methodologies that provide the modeling framework. As with any data‐driven effort, the outcome critically depends on the input data, its featurization process and possible associated biases. Also, the machine learning task, (supervised/unsupervised) the learning method, and the possible usage of molecular dynamics data considerably shape the inherent assumptions of the modeling step. Defining a proper quantitative thermodynamic and/or kinetic score (or label) is key to the modeling process; here we revise literature and propose residence time as a novel ideal indicator of ligandability. Interestingly the vast majority of the methods does not keep into consideration kinetics nor thermodynamics when devising predictors.This article is categorized under: Data Science > Artificial Intelligence/Machine Learning Structure and Mechanism > Computational Biochemistry and Biophysics Data Science > Chemoinformatics

Full Text