Abstract
Choosing the right function prediction tools The vast majority of known proteins have not yet been characterized experimentally, and there is very little that is known about their function. New unannotated sequences are added to the databases at a pace that far exceeds the one in which they are annotated in the lab. Computational biology offers tools that can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history, and their association with other proteins. In this contribution, we attempt to provide a framework that will enable biologists and computational biologists to decide which type of computational tool is appropriate for the analysis of their protein of interest, and what kind of insights into its function these tools can provide. In particular, we describe computational methods for predicting protein function directly from sequence or structure, focusing mainly on methods for predicting molecular function. We do not discuss methods that rely on sources of information that are beyond the protein itself, such as genomic context [1], protein–protein interaction networks [2], or membership in biochemical pathways [3]. When choosing a tool for function prediction, one would typically want to identify the best performing tool. However, a quantitative comparison of different tools is a tricky task. While most developers report their own assessment of their tool, in most cases there are no standard datasets and generally agreed-upon measures and criteria for benchmarking function prediction methods. In the absence of independent benchmarks, comparing the figures reported by the developers is almost always comparing oranges and apples (for discussion of this problem see [4]). Therefore, we refrain from reporting numerical assessments of specific methods. For those cases in which independent assessment of performance is available, we refer the reader to the original publications. Finally, we discuss only methods that are either accessible as Web servers or freely available for download (relevant Web links can be found in Table S1).
Highlights
When predicting function automatically on a large scale, this problem is intensified by the need to standardize and quantitatively assess the similarity of functions between proteins
Several large-scale projects attempted to respond to this challenge by building classification systems or ontologies of biological functions
When attempting to predict the function of an unannotated protein based on its homology to an annotated one, one should search for orthologs rather than paralogs (Figure 1A)
Summary
Computational biology offers tools that can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history, and their association with other proteins. While most developers report their own assessment of their tool, in most cases there are no standard datasets and generally agreed-upon measures and criteria for benchmarking function prediction methods. When predicting function automatically on a large scale, this problem is intensified by the need to standardize and quantitatively assess the similarity of functions between proteins. Several large-scale projects attempted to respond to this challenge by building classification systems or ontologies of biological functions (see [8,9] for review) One such enterprise was launched as early as 1955 by the International Congress of Biochemistry, which created the Enzyme Commission to come up with a nomenclature for enzymes. GO has become the standard for assessing the performance of function prediction methods
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.