Predicting protein-ligand interactions based on bow-pharmacological space and Bayesian additive regression trees

Li Li,Ching Chiek Koh,Nicholas Keone Lee,Haishuai Wang,J B Brown,Huai-Meng Fan,Daniel Reker,Hien-Haw Liow,Hao Dai,Luonan Chen,Dong-Qing Wei

doi:10.1038/s41598-019-43125-6

Abstract

Identifying potential protein-ligand interactions is central to the field of drug discovery as it facilitates the identification of potential novel drug leads, contributes to advancement from hits to leads, predicts potential off-target explanations for side effects of approved drugs or candidates, as well as de-orphans phenotypic hits. For the rapid identification of protein-ligand interactions, we here present a novel chemogenomics algorithm for the prediction of protein-ligand interactions using a new machine learning approach and novel class of descriptor. The algorithm applies Bayesian Additive Regression Trees (BART) on a newly proposed proteochemical space, termed the bow-pharmacological space. The space spans three distinctive sub-spaces that cover the protein space, the ligand space, and the interaction space. Thereby, the model extends the scope of classical target prediction or chemogenomic modelling that relies on one or two of these subspaces. Our model demonstrated excellent prediction power, reaching accuracies of up to 94.5–98.4% when evaluated on four human target datasets constituting enzymes, nuclear receptors, ion channels, and G-protein-coupled receptors . BART provided a reliable probabilistic description of the likelihood of interaction between proteins and ligands, which can be used in the prioritization of assays to be performed in both discovery and vigilance phases of small molecule development.

Highlights

Exploring protein-ligand interactions is essential to drug discovery and chemical biology in navigating the space of small molecules and their perturbations on biological networks
We describe a novel prediction model by applying Bayesian Additive Regression Trees (BART) and other machine learning methods on these combined features from protein, ligand, and interaction information
Prediction based on bow-pharmacological space and BART

Summary

Bayesian additive regression trees

Li Li1,2,3, Ching Chiek Koh 4,5, Daniel Reker 6,7,8, J.B. Brown[9], Haishuai Wang[10,11], Nicholas Keone Lee 4,12, Hien-haw Liow[13], Hao Dai 1,14, Huai-Meng Fan[1], Luonan Chen14,15 & Dong-Qing Wei 1. Ligand-based methods (e.g., fingerprint similarity searching, pharmacophore models, and machine learning approaches) are increasingly applied in research and development for the prediction of on- and off-target interactions, but often require large amounts of available ligand data to achieve the desired predictive accuracy. Another widely used computational strategy is text mining, which uses databases of scientific literature such as PubMed[5]. We describe a novel prediction model by applying Bayesian Additive Regression Trees (BART) and other machine learning methods on these combined features from protein, ligand, and interaction information. In addition to retrospective analysis, we highlight one exemplary prediction for a novel ligand of the KIF11 protein that was successfully validated using a docking simulation and subsequently confirmed by a crystallography study executed by an independent research group

Results

Discussion

Models and Methods

Author Contributions

Additional Information