Abstract

BackgroundThe number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.DescriptionWe describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment.The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service.ConclusionBy providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

Highlights

  • The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.Description: We describe a fully automated service for annotating bacterial and archaeal genomes

  • In the sections below we describe the steps the RAST server implements to automatically produce two classes of asserted gene functions: subsystem-based assertions are based on recognition of functional variants of subsystems, while nonsubsystem-based assertions are filled in using more common approaches based on integration of evidence from a number of tools

  • There are obvious limitations in using existing SEED genomes to evaluate the service, and this lead us to add a comparison of RAST annotations to KAAS (KEGG Automatic Annotation Server) [22] annotations, the only other public annotation service that we are aware of which will allow an online sequence submission

Read more

Summary

Background

In 1995 the first complete genome became available. Since hundreds more have been sequenced, and it has become clear that thousands will follow shortly. The cooperative effort to develop subsystems has produced a publicly available set of such populated subsystems that includes over 600 subsystems These subsystems include assertions of function for well over 500,000 protein-encoding genes in over 500 bacterial and archaeal genomes (relating to over 6200 functional roles). This manually curated collection represents sets of co-curated protein families. Each of those groups can be expanded (by clicking the "+" button) down to the specific protein encoding genes (pegs) found in a given subsystem This page is the entry point to a whole Genome Browser, the Compare Metabolic Reconstruction tool, the View Features and the View Scenarios pages. A subsystem hierarchy can be un-collapsed and for each subsystem that has been asserted, a scenario is given with input and output compounds, their stoicheometry and a relevant coloured KEGG map (if one exists)

Findings
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.