Abstract

The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.

Highlights

  • Introduction and Requirements The DOEJGI Microbial Genome Annotation Pipeline performs structural and functional annotation of bacterial and archaeal genomes included into the Integrated Microbial Genome (IMG) system [1]

  • The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of bacterial and archaeal genomes included into the Integrated Microbial Genome (IMG) system [1]

  • Each sequence dataset submitted for annotation needs to be associated with an analysis project that has already been specified in the Genomes OnLine Database [2]

Read more

Summary

KEGG Orthology term assignment

The genes that can be unambiguously mapped to the entries in KEGG Genes database are assigned the KO terms associated with the corresponding KEGG gene. For genes that are not mapped to KEGG genes, USEARCH is run against the database of KEGG genes by applying UBLAST [14] The results of this search are organized in a list of candidate KO assignments. KO terms are assigned to genes using a subset of this list, whereby the threshold is defined by an E-value cutoff of 1e–5, KO assignments are selected from the top 5 hits, with 30 % or better alignment sequence identity, and alignment percentage of at least 70 % over the length of the query gene and KEGG subject gene. 3. MetaCyc assignment: genes are associated with MetaCyc [15] reactions as follows. The script helps resolving overlaps between hits to Pfam models from the same clan in order to generate final Pfam assignments

InterPro Scan
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call