Abstract
Proteins are the most versatile macromolecules in living systems and perform crucial biological functions. In the advent of the post-genomic era, the next generation sequencing is done routinely at the population scale for a variety of species. The challenging problem is to massively determine the functions of proteins that are yet not characterized by detailed experimental studies. Identification of protein functions experimentally is a laborious and time-consuming task involving many resources. We therefore propose the automated protein function prediction methodology using in silico algorithms trained on carefully curated experimental datasets. We present the improved protein function prediction tool FunPred 3.0, an extended version of our previous methodology FunPred 2, which exploits neighborhood properties in protein–protein interaction network (PPIN) and physicochemical properties of amino acids. Our method is validated using the available functional annotations in the PPIN network of Saccharomyces cerevisiae in the latest Munich information center for protein (MIPS) dataset. The PPIN data of S. cerevisiae in MIPS dataset includes 4,554 unique proteins in 13,528 protein–protein interactions after the elimination of the self-replicating and the self-interacting protein pairs. Using the developed FunPred 3.0 tool, we are able to achieve the mean precision, the recall and the F-score values of 0.55, 0.82 and 0.66, respectively. FunPred 3.0 is then used to predict the functions of unpredicted protein pairs (incomplete and missing functional annotations) in MIPS dataset of S. cerevisiae. The method is also capable of predicting the subcellular localization of proteins along with its corresponding functions. The code and the complete prediction results are available freely at: https://github.com/SovanSaha/FunPred-3.0.git.
Highlights
Proteins with similar functions are more likely to interact
protein–protein interaction network (PPIN) of yeast consists of 4,554 unique proteins in 13,528 protein-protein interactions (PPINs) after the elimination of the self-replicating and the self-interacting protein pairs
After the network refinement through the execution of node and edge weight threshold, non-essential proteins along with unreliable edges get eliminated and the initial PPIN gets reduced to almost 3,174 unique proteins and 6,936 PPINs considering three levels of thresholds from which FunPred 3.0_Clust form protein clusters to generate test set of proteins
Summary
Proteins with similar functions are more likely to interact. If the function of one protein is known the functions of the binding un-annotated protein may either be experimentally assigned or computationally predicted (Chatterjee et al, 2011a, 2011b; Moosavi, Rahgozar & Rahimi, 2013; Prasad et al, 2017; Saha et al, 2012, 2014; Sriwastava, Basu & Maulik, 2015). Recent work of Guoxian and the co-authors (Yu, Zhu & Domeniconi, 2015), explored the incomplete label problem in a hierarchical manner using function correlation Another approach for predicting protein function, as proposed by Piovesan et al (2015), includes the combination of the trio: PPIN information, protein domain and sequence. While most of the predictive models highlights on the most highly related similar proteins in the neighborhood of the test protein, Reinders, Van Ham & Makrodimitris (2018) focuses on the less similar proteins It is shown by the application of label-space dimensionality reduction techniques that though these proteins are less similar but they are quite informative and plays an important role in protein function prediction. Other notable works in this field are Wang et al (2018) and Fa et al (2018)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.