Abstract

The study of protein-protein interactions (PPIs) can be very important for the understanding of biological cellular functions. However, detecting PPIs in the laboratories are both time-consuming and expensive. For this reason, there has been much recent effort to develop techniques for computational prediction of PPIs as this can complement laboratory procedures and provide an inexpensive way of predicting the most likely set of interactions at the entire proteome scale. Although much progress has already been achieved in this direction, the problem is still far from being solved. More effective approaches are still required to overcome the limitations of the current ones. In this study, a novel Multi-scale Local Descriptor (MLD) feature representation scheme is proposed to extract features from a protein sequence. This scheme can capture multi-scale local information by varying the length of protein-sequence segments. Based on the MLD, an ensemble learning method, the Random Forest (RF) method, is used as classifier. The MLD feature representation scheme facilitates the mining of interaction information from multi-scale continuous amino acid segments, making it easier to capture multiple overlapping continuous binding patterns within a protein sequence. When the proposed method is tested with the PPI data of Saccharomyces cerevisiae, it achieves a prediction accuracy of 94.72% with 94.34% sensitivity at the precision of 98.91%. Extensive experiments are performed to compare our method with existing sequence-based method. Experimental results show that the performance of our predictor is better than several other state-of-the-art predictors also with the H. pylori dataset. The reason why such good results are achieved can largely be credited to the learning capabilities of the RF model and the novel MLD feature representation scheme. The experiment results show that the proposed approach can be very promising for predicting PPIs and can be a useful tool for future proteomic studies.

Highlights

  • Protein-protein interactions (PPIs) play a key role in various biological processes and functions in living cells, including metabolic cycles, DNA transcription and replication, and signalling cascades [1,2,3]

  • In this paper we propose a sequence-based approach for the prediction of protein-protein interactions using random forest (RF) model combined with a novel multi-scale local descriptor (MLD) protein feature representation

  • To evaluate the performance of the proposed approach, there are a total of 8 different PPI datasets are used in our experiments, two of which are S.cerevisiae, two are H. pylori, one is C.elegans, one is E.coli, one is H.sapiens, and one is M.musculus

Read more

Summary

Introduction

Protein-protein interactions (PPIs) play a key role in various biological processes and functions in living cells, including metabolic cycles, DNA transcription and replication, and signalling cascades [1,2,3]. Because the conjoint triad method cannot takes neighbouring effect into account and the interactions usually occur in the discontinuous amino acids segments in the sequence, on the other work Guo et al developed a method based on SVM and auto covariance to extract the interactions information in the discontinuous amino acids segments in the sequence [35]. Their method yielded a prediction accuracy of 86.55%, when applied to predicting saccharomyces cerevisiae PPI. The prediction model is assessed using the independent dataset of the Helicobacter pylori PPI and yielded 88.30% prediction accuracy, which further demonstrates the effectiveness of our method

Results
Evaluation measures
Conclusions
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.