Abstract

Self-interacting proteins (SIPs) is of paramount importance in current molecular biology. There have been developed a number of traditional biological experiment methods for predicting SIPs in the past few years. However, these methods are costly, time-consuming and inefficient, and often limit their usage for predicting SIPs. Therefore, the development of computational method emerges at the times require. In this paper, we for the first time proposed a novel deep learning model which combined natural language processing (NLP) method for potential SIPs prediction from the protein sequence information. More specifically, the protein sequence is de novo assembled by k-mers. Then, we obtained the global vectors representation for each protein sequences by using natural language processing (NLP) technique. Finally, based on the knowledge of known self-interacting and non-interacting proteins, a multi-grained cascade forest model is trained to predict SIPs. Comprehensive experiments were performed on yeast and human datasets, which obtained an accuracy rate of 91.45% and 93.12%, respectively. From our evaluations, the experimental results show that the use of amino acid semantics information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work would have potential applications for various biological classification problems.

Highlights

  • Proteins perform a vast array of functions within organisms

  • We first assessed the proposed method on the self-interacting proteins (SIPs) extracted from yeast dataset

  • We only separated the datasets which were mainly composed of characteristic values into k non-overlapping pieces, and each training sample was used k − 1 times in forest to generate k − 1 class categories list, and averaged them to generate the final result as the enhancement feature of the level in the cascade forest

Read more

Summary

Introduction

Proteins perform a vast array of functions within organisms. Their self-interaction needs to be considered for the full understanding of cell functions and biological phenomena. It is always an important task to identify the interaction between proteins because of the large data it contains in the post-genome era. The prediction of self-interacting proteins (SIPs) will offer a wide understanding to drug target detection [1], drug discovery [2,3], and even further biological processes [4]. The previous biological experimental studies [5,6] have many disadvantages such as high cost, time-consuming, low efficiency and so on. In order to efficiently predict SIPs, many researchers try their best to draw attention to develop new strategies

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call