Abstract

Natively unstructured regions are a common feature of eukaryotic proteomes. Between 30% and 60% of proteins are predicted to contain long stretches of disordered residues, and not only have many of these regions been confirmed experimentally, but they have also been found to be essential for protein function. In this study, we directly address the potential contribution of protein disorder in predicting protein function using standard Gene Ontology (GO) categories. Initially we analyse the occurrence of protein disorder in the human proteome and report ontology categories that are enriched in disordered proteins. Pattern analysis of the distributions of disordered regions in human sequences demonstrated that the functions of intrinsically disordered proteins are both length- and position-dependent. These dependencies were then encoded in feature vectors to quantify the contribution of disorder in human protein function prediction using Support Vector Machine classifiers. The prediction accuracies of 26 GO categories relating to signalling and molecular recognition are improved using the disorder features. The most significant improvements were observed for kinase, phosphorylation, growth factor, and helicase categories. Furthermore, we provide predicted GO term assignments using these classifiers for a set of unannotated and orphan human proteins. In this study, the importance of capturing protein disorder information and its value in function prediction is demonstrated. The GO category classifiers generated can be used to provide more reliable predictions and further insights into the behaviour of orphan and unannotated proteins.

Highlights

  • One of the challenges of the post-genomic era is to predict the function of a protein given its amino acid sequence

  • Some of the biological process (BP) categories relating to transcription and the Transcription factor molecular function (MF) category could be recognised with sensitivities of .50% at false positive rates of less than 10%, yielding Matthews correlations of !0.3

  • The aim of this study was to investigate the contribution of protein disorder features in protein function prediction

Read more

Summary

Introduction

One of the challenges of the post-genomic era is to predict the function of a protein given its amino acid sequence. Around 35% of proteins cannot be accurately annotated by homology-based transfer methods [3], highlighting the need for function prediction methods that are independent of sequence similarity. ProtFun [4,5] is an ab initio feature based protein function prediction method that addresses the annotation of orphan proteins and is applicable to any protein whose sequence is known. Similar approaches have been reported using structural properties and sequence information for prediction of enzyme classes [6,7]. One advantage of this type of approach is that features that are important in recognition of different function classes can be identified and quantified

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call