Abstract

The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provided via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation is followed by functional annotation including assignment of protein product names and connection to various protein family databases.

Highlights

  • The DOE-JGI Metagenome Annotation Pipeline (MAP) supports the structural and functional annotation of metagenomic datasets submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system [1]

  • The DOE-JGI MAP requires a multi-FASTA file of assembled nucleotide sequences and/or a fastq file containing unassembled 454, Illumina or PacBio reads as input, though no assembly is performed on the unassembled reads

  • Consistency and reproducibility of the results produced by MAP depend on the databases and software used in the pipeline

Read more

Summary

Introduction

The DOE-JGI Metagenome Annotation Pipeline (MAP) supports the structural and functional annotation of metagenomic datasets submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system [1]. The annotation includes the prediction of CRISPR elements, non-coding and protein-coding genes, and ends with the assignment of a product name and the prediction of functions for each gene. The annotated metagenomic datasets produced by MAP are integrated into IMG/M where they can be analyzed or revised in the context of a comprehensive set of publicly available genomes and metagenomes. To submit sequence datasets for annotation they. Identification of genes and repeats produces a GFF file without any functional information for the predicted genes. These protein coding genes are assigned with a function followed by integration into IMG

Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.