A close look at protein function prediction evaluation protocols.

Indika Kahanda,Christopher S Funk,Asa Ben-Hur,Fahad Ullah,Karin M Verspoor

doi:10.1186/s13742-015-0082-5

Indika Kahanda, Christopher S Funk + Show 3 more

Open Access

https://doi.org/10.1186/s13742-015-0082-5

Copy DOI

Journal: GigaScience	Publication Date: Sep 14, 2015
Citations: 41	License type: cc-by

Affiliation: Colorado State University, University of Melbourne

Abstract

BackgroundThe recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. The CAFA2 task is more realistic, in that it more closely mimics the accumulation of annotations over time. In this study we compare these tasks in terms of their difficulty, and determine whether cross-validation provides a good estimate of performance.ResultsThe CAFA2 task is a combination of two subtasks: making predictions on annotated proteins and making predictions on previously unannotated proteins. In this study we analyze the performance of several function prediction methods in these two scenarios. Our results show that several methods (structured support vector machine, binary support vector machines and guilt-by-association methods) do not usually achieve the same level of accuracy on these two tasks as that achieved by cross-validation, and that predicting novel annotations for previously annotated proteins is a harder problem than predicting annotations for uncharacterized proteins. We also find that different methods have different performance characteristics in these tasks, and that cross-validation is not adequate at estimating performance and ranking methods.ConclusionsThese results have implications for the design of computational experiments in the area of automated function prediction and can provide useful insight for the understanding and design of future CAFA competitions.Electronic supplementary materialThe online version of this article (doi:10.1186/s13742-015-0082-5) contains supplementary material, which is available to authorized users.

Highlights

The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not
In the first CAFA (CAFA1) the participants were provided with a list of protein targets that did not have any previous annotations and were asked to submit computational predictions using their
Prediction tasks We identify two tasks in the area of article function prediction (AFP): prediction of annotations for proteins without previous annotations and prediction of novel annotations for proteins that already have some annotations associated with them

Summary

Introduction

The recently held Critical Assessment of Function Annotation challenge (CAFA2) required its participants to submit predictions for a large number of target proteins regardless of whether they have previous annotations or not. This is in contrast to the original CAFA challenge in which participants were asked to submit predictions for proteins with no existing annotations. In the first CAFA (CAFA1) the participants were provided with a list of protein targets that did not have any previous annotations and were asked to submit computational predictions using their

Methods

Results

Conclusion