Abstract

BackgroundFunctional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013–2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets.ResultsFor CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed.ConclusionsUpdating the annotation database was successful, improving the Fmax prediction accuracy score for both PFP and ESG. Adding the prior distribution of GO terms did not make much improvement. Both of the ensemble methods we developed improved the average Fmax score over all individual component methods except for ESG. Our benchmark results will not only complement the overall assessment that will be done by the CAFA organizers, but also help elucidate the predictive powers of sequence-based function prediction methods in general.Electronic supplementary materialThe online version of this article (doi:10.1186/s13742-015-0083-4) contains supplementary material, which is available to authorized users.

Highlights

  • Functional annotation of novel proteins is one of the central problems in bioinformatics

  • Conventional protein function prediction methods such as BLAST [1], FASTA [2], and SSEARCH [3] rely on the concept of homology

  • We report benchmark results and enhancements made to protein function prediction (PFP) [13, 14] and extended similarity group (ESG) [16] as preparation for the CAFA2 experiment, prior to participation

Read more

Summary

Introduction

Functional annotation of novel proteins is one of the central problems in bioinformatics. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To achieve fast and automatic function annotation of novel/nonannotated proteins, a large variety of automated function prediction (AFP) methods have been developed. There are several methods that thoroughly extract function information from sequence database search results using different strategies. These methods include GOFigure [9], OntoBlast [10], Gotcha [11], GOPET [12], the protein function prediction (PFP) method [13, 14], ConFunc [15], and the extended similarity group (ESG) method [16]. There are other function prediction methods that consider coexpression patterns of genes [20,21,22,23,24], 3D structures of proteins [25,26,27,28,29,30,31,32,33,34], and interacting proteins in large-scale protein-protein interaction networks [35,36,37,38,39,40]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call