The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Naihui Zhou,Kimberley A Lewis,Peter W Rose,Jari Björne,Daniel B Roche,Vladimir Gligorijević,Tomislav Šmuc,David T Jones,Heiko Schoof,Petri Törönen,Da Chen Emily Koo,Mateo Torres,Alfonso E Romero,Alperen Dalkıran,Po-Han Chi,David W Ritchie,Caleb Chandler,Castrense Savojardo,Vedrana Vidulin,Yuxiang Jiang,Branislava Gemović ,Wen‐Hung Liao ,Silvio C E Tosatto ,Marco Notaro,Jeffrey M Yunes,Erica Suh,Neven Šumonja ,L Taylor Davis ,Miguel Amezola,Rengül Cetin-Atalay ,Hafeez Ur Rehman,Tatyana Goldberg,Timothy Bergquist ,D Barrie Johnson ,Jonathan Dayton ,Ilya B Novikov ,Alberto Paccanaro,Aashish Jain,Alexandre Renaux,Marco Carraro,Martti Tolvanen ,Robert Hoehndorf,Pier Luigi Martelli,Meet Barot,Florian Boecker,Slobodan Vučetić ,Feng Zhang,Claire O′Donovan ,Zihan Zhang,Ehsaneddin Asgari,Jianlin Cheng,Constance J Jeffery,Michele Berselli,Matteo Rè ,Radoslav Davidović ,Julian Gough,Ronghui You,Prajwal Bhat,Adrian M Altenhoff ,Alexandra Lee ,Sabeur Aridhi,Jung‐Hua Chang ,Huy Nguyen ,Deborah A Hogan,Stefano Pascarelli,Chenguang Zhao,Shanshan Zhang,Marco Falda,Yiwei Liu ,Rui Fa,Alessandro Petrini,Alfredo Benso,Zheng Wang,Farrokh Mehryary,Itamar Borukhov,Richard Bonneau,Gage S Black,Shanfeng Zhu,Asa Ben‐Hur ,Danielle Allison Brackenridge ,Qizhong Mao,Giuseppe Profiti,Rabie Saidi,Burkhard Rost,Nevena Veljković ,Giovanni Bosco,Angela D Wilkins ,Alex Warwick Vesztrocy,Seyed Ziaeddin Alborzi,Giorgio Valentini,Dallas J Larsen,Yang Zhang,Michael L Tress,Vladimir Perović ,Suyang Dai,Casey S Greene,Filip Ginter,Stefano Di Carlo,B Kacsóh ,Tapio Salakoski,Ashton Omdahl ,Volkan Atalay,Alan Medlar ,Peter L Freddolino,Hai Fang,José Manuel Rodriguez ,Sayoni Das,Fábio Fabris ,Weidong Tian,Hans Moen,Md-Nafiz Hamid ,Stefano Toppo,Jonas Reeb,Kai Hakala,Maria Jesus Martin ,Gianfranco Politano,Olivier Lichtarge,Marie‐Dominique Devignes ,Dane Jo,Tunca Doğan ,Elaine Zosa,Giuliano Grossi,Ahmet Süreyya Rifaioğlu ,Daisuke Kihara,Predrag Radivojac,Liisa Holm,Mohammad R K Mofrad ,Christine A Orengo,Mark N Wass,Imane Boudellioua,Liam J Mcguffin,Patricia C Babbitt,José Marı́a Fernández ,Alex W Crocker,Indika Kahanda,Chengsong Wan ,Marco Frasca,Michal Linial,Ian Sillitoe,George P Georghiou ,Chengxin Zhang,Damiano Piovesan,Enrico Lavezzo,Jonathan G Lees ,Sašo Džeroski ,Suwisa Kaewphan,Renzhi Cao,Fran Supek,Steven E Brenner,Wei-Cheng Tseng,Haixuan Yang,Sean D Mooney,Domenico Cozzetto,Marco Mesiti,Paolo Fontana,Natalie Thurlby,Alice C Mchardy,Maxat Kulmanov,Alex A Freitas,Rebecca L Hurto,Christophe Dessimoz,Rita Casadio,Yotam Frank,Magdalena Antczak,Shuwei Yao,Jie Hou,Iddo Friedberg

doi:10.1186/s13059-019-1835-8

Abstract

BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.

Highlights

High-throughput nucleic acid sequencing [1] and massspectrometry proteomics [2] have provided us with a deluge of data for DNA, RNA, and proteins in diverse species
Top methods have improved from CAFA2 to CAFA3, but improvement was less dramatic than from CAFA1 to CAFA2 One of Critical Assessment of Functional Annotation (CAFA)’s major goals is to quantify the progress in function prediction over time
We conducted a comparative evaluation of top CAFA1, CAFA2, and CAFA3 methods according to their ability to predict Gene Ontology [28] terms on a set of common benchmark proteins

Summary

Introduction

High-throughput nucleic acid sequencing [1] and massspectrometry proteomics [2] have provided us with a deluge of data for DNA, RNA, and proteins in diverse species. To address the growing gap between high-throughput data and deep biological insight, a variety of computational methods that predict protein function have been developed over the years [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24] This explosion in the number of methods is accompanied by the need to understand how well they perform, and what improvements are needed to satisfy the needs of the life sciences community. The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function

Methods

Results

Conclusion