Abstract

Here, we describe MetaErg, a standalone and fully automated metagenome and metaproteome annotation pipeline. Annotation of metagenomes is challenging. First, metagenomes contain sequence data of many organisms from all domains of life. Second, many of these are from understudied lineages, encoding genes with low similarity to experimentally validated reference genes. Third, assembly and binning are not perfect, sometimes resulting in artifactual hybrid contigs or genomes. To address these challenges, MetaErg provides graphical summaries of annotation outcomes, both for the complete metagenome and for individual metagenome-assembled genomes (MAGs). It performs a comprehensive annotation of each gene, including taxonomic classification, enabling functional inferences despite low similarity to reference genes, as well as detection of potential assembly or binning artifacts. When provided with metaproteome information, it visualizes gene and pathway activity using sequencing coverage and proteomic spectral counts, respectively. For visualization, MetaErg provides an HTML interface, bringing all annotation results together, and producing sortable and searchable tables, collapsible trees, and other graphic representations enabling intuitive navigation of complex data. MetaErg, implemented in Perl, HTML, and JavaScript, is a fully open source application, distributed under Academic Free License at https://github.com/xiaoli-dong/metaerg. MetaErg is also available as a docker image at https://hub.docker.com/r/xiaolidong/docker-metaerg.

Highlights

  • Genome annotation is, literally, the annotation of features on assembled DNA molecules

  • We present MetaErg, an extendable standalone annotation pipeline developed for metagenome-assembled genomes (MAGs)

  • With MetaErg, we provide a standalone and fully automated metagenome and metaproteome annotation pipeline

Read more

Summary

Introduction

The annotation of features on assembled DNA molecules. Such features are, in the first place, genes, including those encoding proteins [“open reading frames” (ORFs)] and those encoding ribosomal or transfer RNA molecules. Annotation is usually the final step of the automated computational processing of genomic or metagenomic data and the beginning of biology. Depending on their background and research question, biologists will have different annotation needs. When the research targets a single microbe, detailed gene-by-gene annotation of its genome would be desired.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call