Abstract

BackgroundDespite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort.ResultsWe have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website , as well as in the Community Annotation track of the Genome Browser.ConclusionWe have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at .

Highlights

  • Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation

  • To improve the quality of the rice genome annotation, we have developed EuCAP (Eukaryotic Community Annotation Package), a portable and flexible system for 1) the submission of structural and functional annotation of genes by dispersed Community Annotators (CAs), 2) the evaluation of community annotators (CA) contributions by Project Annotators (PAs), and 3) the incorporation of CA contributions into our ongoing annotation effort

  • The submitted model can be visualized alongside the Osa1 model(s), Rice Transcript Assemblies as well as full-length cDNAs aligning in the same region

Read more

Summary

Introduction

Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. It is the product of a mature pipeline that integrates ESTs and full-length cDNAs with the FGENESH ab initio gene predictions, using the Program to Assemble Spliced Alignments (PASA [2]) for structural annotation. Despite the availability of over 1.1 million ESTs and 33,000 full-length cDNAs [7], 28,006 of the total 51,286 non-transposable element-related gene models predicted in the rice genome have no or partial experimental support for their structure. Some genes are probably undetected by our pipeline, as suggested by Massively Parallel Signature Sequence (MPSS) data [8]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call