Prediction of protein-protein interaction sites by means of ensemble learning and weighted feature descriptor.

Xiuquan Du,Shiwei Sun,Changlin Hu,Xinrui Li,Junfeng Xia

doi:10.1186/s40709-016-0046-7

Xiuquan Du, Shiwei Sun + Show 3 more

Open Access

PDF Available

https://doi.org/10.1186/s40709-016-0046-7

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundReliable prediction of protein–protein interaction sites is an important goal in the field of bioinformatics. Many computational methods have been explored for the large-scale prediction of protein–protein interaction sites based on various data types, including protein sequence, structural and genomic data. Although much progress has been achieved in recent years, the problem has not yet been satisfactorily solved.ResultsIn this work, we presented an efficient approach that uses ensemble learning algorithm with weighted feature descriptor (EL-WFD) to predict protein–protein interaction sites. Moreover, weighted feature descriptor was designed to describe the distance influence of neighboring residues on interaction sites. The results on two dataset (Hetero and Homo), show that the proposed method yields a satisfactory accuracy with 83.8 % recall and 96.3 % precision on the Hetero dataset and 84.2 % recall and 96.3 % precision on the Homo dataset, respectively. In both datasets, our method tend to obtain high Mathews correlation coefficient compared with state-of-the-art technique random forest method.ConclusionsThe experimental results show that the EL-WFD method is quite effective in predicting protein–protein interaction sites. The novel weighted feature descriptor was proved to be promising in discovering interaction sites. Overall, the proposed method can be considered as a new powerful tool for predicting protein–protein interaction sites with excellence performance.

Highlights

Reliable prediction of protein–protein interaction sites is an important goal in the field of bioinformatics
Machine learning methods such as support vector machine (SVM) [6,7,8], neural networks (NN) [9,10,11,12], Bayesian networks (BN) [13,14,15,16], random forests (RF) [17, 18], conditional random fields (CRF) [19], extreme learning machine (ELM) [20] and L1-logreg classifier [21] have been successful applied for predicting binding sites
Fivefolds cross validation was used to evaluate the performance of different methods on the Train Set (TRS)

Summary

Introduction

Reliable prediction of protein–protein interaction sites is an important goal in the field of bioinformatics. A number of computational methods have been proposed for the prediction of interaction sites in proteins based on the sequence information [3, 4], 3D structure information [5] or a combination of 3D structure and sequence information. Machine learning methods such as support vector machine (SVM) [6,7,8], neural networks (NN) [9,10,11,12], Bayesian networks (BN) [13,14,15,16], random forests (RF) [17, 18], conditional random fields (CRF) [19], extreme learning machine (ELM) [20] and L1-logreg classifier [21] have been successful applied for predicting binding sites.

Methods

Results

Conclusion