The Eukaryote Genome Annotation Platform at Genoscope

Betina Porcel,France Denoeud,Corinne Da Silva,Benjamin Noel,Patrick Wincker,Franck Aniere,François Artiguenave,Jean-Marc Aury,Claude Scarpelli,Jean Weissenbach,Olivier Jaillon,Sylvain Bonneval

doi:10.1038/npre.2009.3457

Betina Porcel, France Denoeud + Show 10 more

Open Access

https://doi.org/10.1038/npre.2009.3457

Copy DOI

Abstract

The Genoscope annotation workflow for eukaryote genomes relies on evidence from ab initio gene models predictions combined with homology searches, using collections of expressed sequences - full length cDNAs, ESTs or massive-scale mRNA sequences from the same or closely related organisms – proteins or other genomic sequences. Global analysis of these drafts or complete sequences are then combining both approaches in the form of gene prediction data integration using GAZE, capable to identify a majority of the existing gene features. Although of very good quality, gene-modelling remains still tentative at the end of the process. Even though computational predictors are useful on large scale annotation for global genomics analysis, there is no complete genome for which all gene structures, in terms of exons, introns and coding regions, have been experimentally confirmed.Finished genomes can provide exciting insights into the genome organization and evolution. Additional experimental data generated by genome sequencing projects give assistance to genome annotation aiming to a better understanding of the biology of the organism. Therefore, gene models and annotation can be improved by human curation to find errors or to resolve incongruous evidence on the automatic annotation of the genome. We now provide to collaborators carrying sequencing projects with a distributed annotation platform allowing expert evaluation of the annotation, in addition to our automated gene prediction pipeline.To ensure at most the participation of the scientific community, an annotation tool for revising annotations has been set up using components of the Generic Model Organism Database toolkit, which provides tools for managing organism databases. A CHADO database, linked to an Apollo graphical interface, permit users to correct gene structures and store them in a dedicated organism database, as we will show on a few examples. Such a tool would facilitate connecting and comparing predicted annotations with existing biological data, becoming the repository of complete annotated finished genome sequence.

Full Text