Proteochemometric modeling in a Bayesian framework.

Isidro Cortes-Ciriano,Eelke Bart Lenselink,Daniel S Murrell,Gerard Jp Van Westen,Andreas Bender,Thérèse Malliavin

doi:10.1186/1758-2946-6-35

Abstract

Proteochemometrics (PCM) is an approach for bioactivity predictive modeling which models the relationship between protein and chemical information. Gaussian Processes (GP), based on Bayesian inference, provide the most objective estimation of the uncertainty of the predictions, thus permitting the evaluation of the applicability domain (AD) of the model. Furthermore, the experimental error on bioactivity measurements can be used as input for this probabilistic model.In this study, we apply GP implemented with a panel of kernels on three various (and multispecies) PCM datasets. The first dataset consisted of information from 8 human and rat adenosine receptors with 10,999 small molecule ligands and their binding affinity. The second consisted of the catalytic activity of four dengue virus NS3 proteases on 56 small peptides. Finally, we have gathered bioactivity information of small molecule ligands on 91 aminergic GPCRs from 9 different species, leading to a dataset of 24,593 datapoints with a matrix completeness of only 2.43%.GP models trained on these datasets are statistically sound, at the same level of statistical significance as Support Vector Machines (SVM), with values on the external dataset ranging from 0.68 to 0.92, and RMSEP values close to the experimental error. Furthermore, the best GP models obtained with the normalized polynomial and radial kernels provide intervals of confidence for the predictions in agreement with the cumulative Gaussian distribution. GP models were also interpreted on the basis of individual targets and of ligand descriptors. In the dengue dataset, the model interpretation in terms of the amino-acid positions in the tetra-peptide ligands gave biologically meaningful results.

Highlights

The advent of high-throughput (HT) technologies has contributed in the last decades to a vast data increase in proprietary and public bioactivity databases
PCM Gaussian Processes (GP) models agree with the validation criteria Overall, the models obtained for the three datasets with Gaussian Process modeling display statistics in agreement with our validation criteria (Table 2 and Additional file 1: Table S3)
The best GP model for the adenosine receptors dataset was obtained with the normalized polynomial (NP) kernel, exhibiting RMSEPext and R20 ext values of 0.58 pKi units and 0.75 respectively

Summary

Introduction

The advent of high-throughput (HT) technologies has contributed in the last decades to a vast data increase in proprietary and public bioactivity databases. A large amount of biological data has been collected on protein structure and sequence information for numerous species. Chemogenomic techniques [1,2,3] can capitalize on this large amount of information by modeling the relationships between the chemical and the biological space. In the field of chemogenomics, Proteochemometrics (PCM) [6] uses machine learning models to relate compounds to their biomolecular targets (usually proteins). PCM permits to detect compound substructures conferring inhibitory activity to a panel of related biomolecular targets [14]

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Jun 28, 2014
Citations: 97	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Proteochemometric modeling in a Bayesian framework.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Gaussian Processes for Short-Term Traffic Volume Forecasting
Yuanchang Xie ... Ying Sun
Transportation Research Record: Journal of the Transportation Research Board | VOL. 2165
Yuanchang Xie, et. al.Yuanchang Xie ... Ying Sun
01 Jan 2009
Transportation Research Record: Journal of the Transportation Research Board | VOL. 2165

Machine-Learning Assisted X-Ray Spectroscopy for High- Throughput Characterization of Magnetic Materials
T Ueno ... H Hino
-
T Ueno, et. al.T Ueno ... H Hino
01 Apr 2018
01 Apr 2018

Closed-Loop Control with Evolving Gaussian Process Models
Juš Kocijan ... Dejan Petelin
-
Juš Kocijan, et. al.Juš Kocijan ... Dejan Petelin
01 Jan 2015
01 Jan 2015

Financial modeling using Gaussian process models
Dejan Petelin ...
-
Dejan Petelin, et. al.Dejan Petelin ...
01 Sep 2011
01 Sep 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Proteochemometric modeling in a Bayesian framework.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics