BgN-Score and BsN-Score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes.

Hossam M Ashtawy,Nihar R Mahapatra

doi:10.1186/1471-2105-16-s4-s8

Hossam M Ashtawy, Nihar R Mahapatra

Open Access

https://doi.org/10.1186/1471-2105-16-s4-s8

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Feb 23, 2015
Citations: 87	License type: cc-by

Affiliation: Michigan State University

Abstract

BackgroundAccurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity.ResultsWe assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests.ConclusionsEnsemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher when they are used to predict binding affinities of protein-ligand complexes that are related to their training sets.

Highlights

Predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology
This involves docking tens of thousands to millions of ligand candidates into a target protein receptor’s binding site and using a suitable scoring function (SF) to evaluate the binding affinity of each candidate to identify the top candidates as drug leads, and to perform lead optimization [2]; it is used for target identification [4]
We recently proposed random forests (RF), boosted regression trees (BRT), support vector machines (SVM), k-nearest neighbors, and multivariate adaptive regression splines (MARS) nonlinear scoring functions and compared their ligand scoring and ranking performances against the sixteen conventional SFs considered by Cheng et al on the same benchmark test sets [16,17]

Summary

Introduction

Predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Due to prohibitive costs and delays associated with experimental drug discovery, pharmaceutical and biotechnology companies rely on virtual screening using computational molecular docking [1,2,3]. This involves docking tens of thousands to millions of ligand candidates into a target protein receptor’s binding site and using a suitable scoring function (SF) to evaluate the binding affinity of each candidate to identify the top candidates as drug leads, and to perform lead optimization [2]; it is used for target identification [4]. It has become attractive because of the ever-increasing number of available receptor protein structures and putative ligand drug candidates in publicly-accessible databases, such as the Protein Data Bank (PDB) [8], PDBbind [9], Cambridge Structural Database (CSD) [10], and corporate repositories

Objectives

Methods

Results

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BgN-Score and BsN-Score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A Comparative Assessment of Conventional and Machine-Learning-Based Scoring Functions in Predicting Binding Affinities of Protein-Ligand Complexes
Hossam M Ashtawy ... Nihar R Mahapatra
-
Hossam M Ashtawy, et. al.Hossam M Ashtawy ... Nihar R Mahapatra
01 Nov 2011
01 Nov 2011

Does Accurate Scoring of Ligands against Protein Targets Mean Accurate Ranking?
Hossam M Ashtawy ... Nihar R Mahapatra
-
Hossam M Ashtawy, et. al.Hossam M Ashtawy ... Nihar R Mahapatra
01 Jan 2013
01 Jan 2013

A Comparative Assessment of Ranking Accuracies of Conventional and Machine-Learning-Based Scoring Functions for Protein-Ligand Binding Affinity Prediction
Hossam M Ashtawy ... Nihar R Mahapatra
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 9
Hossam M Ashtawy, et. al.Hossam M Ashtawy ... Nihar R Mahapatra
01 Sep 2012
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 9

A Comparative Assessment of Predictive Accuracies of Conventional and Machine Learning Scoring Functions for Protein-Ligand Binding Affinity Prediction
Hossam M Ashtawy ... Nihar R Mahapatra
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 12
Hossam M Ashtawy, et. al.Hossam M Ashtawy ... Nihar R Mahapatra
01 Mar 2015
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BgN-Score and BsN-Score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics