Abstract

BackgroundMetagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes.ResultsWe present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences.ConclusionXander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-015-0093-6) contains supplementary material, which is available to authorized users.

Highlights

  • Metagenomics can provide important insight into microbial communities

  • Using an hidden Markov model (HMM), the paths most likely to code for the target gene can be extended first limiting the portion of the assembly graph that must be explored

  • Xander assembly of pooled rhizosphere soil data Since the longer kmer and pruning performed better based on the results using the Human Microbiome Project (HMP)-defined community data, a kmer length of 45 and prune 20 was used throughout the analyses described below

Read more

Summary

Introduction

Assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. Metagenomics faces scalability challenges stemming from the amount of raw sequencing data necessary to describe complex microbial communities, often termed the microbiome [1, 2]. We propose a gene-targeted assembly approach called Xander for assembling metagenomic datasets. Xander is a de Bruijn graph [7] assembler [8] that uses external information to perform a guided, instead of exhaustive, traversal of the assembly graph. Xander uses profile hidden Markov models (HMMs) [9] to guide graph traversal (HMM-guided assembly). Using an HMM, the paths most likely to code for the target gene can be extended first limiting the portion of the assembly graph that must be explored. In addition to limiting the graph traversal, the HMM

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call