Abstract

BackgroundReliable prediction of protein–protein interaction sites is an important goal in the field of bioinformatics. Many computational methods have been explored for the large-scale prediction of protein–protein interaction sites based on various data types, including protein sequence, structural and genomic data. Although much progress has been achieved in recent years, the problem has not yet been satisfactorily solved.ResultsIn this work, we presented an efficient approach that uses ensemble learning algorithm with weighted feature descriptor (EL-WFD) to predict protein–protein interaction sites. Moreover, weighted feature descriptor was designed to describe the distance influence of neighboring residues on interaction sites. The results on two dataset (Hetero and Homo), show that the proposed method yields a satisfactory accuracy with 83.8 % recall and 96.3 % precision on the Hetero dataset and 84.2 % recall and 96.3 % precision on the Homo dataset, respectively. In both datasets, our method tend to obtain high Mathews correlation coefficient compared with state-of-the-art technique random forest method.ConclusionsThe experimental results show that the EL-WFD method is quite effective in predicting protein–protein interaction sites. The novel weighted feature descriptor was proved to be promising in discovering interaction sites. Overall, the proposed method can be considered as a new powerful tool for predicting protein–protein interaction sites with excellence performance.

Highlights

  • Reliable prediction of protein–protein interaction sites is an important goal in the field of bioinformatics

  • Machine learning methods such as support vector machine (SVM) [6,7,8], neural networks (NN) [9,10,11,12], Bayesian networks (BN) [13,14,15,16], random forests (RF) [17, 18], conditional random fields (CRF) [19], extreme learning machine (ELM) [20] and L1-logreg classifier [21] have been successful applied for predicting binding sites

  • Fivefolds cross validation was used to evaluate the performance of different methods on the Train Set (TRS)

Read more

Summary

Introduction

Reliable prediction of protein–protein interaction sites is an important goal in the field of bioinformatics. A number of computational methods have been proposed for the prediction of interaction sites in proteins based on the sequence information [3, 4], 3D structure information [5] or a combination of 3D structure and sequence information. Machine learning methods such as support vector machine (SVM) [6,7,8], neural networks (NN) [9,10,11,12], Bayesian networks (BN) [13,14,15,16], random forests (RF) [17, 18], conditional random fields (CRF) [19], extreme learning machine (ELM) [20] and L1-logreg classifier [21] have been successful applied for predicting binding sites.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call