Abstract

A major goal of metagenomics is to characterize the microbial composition of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from metagenomic shotgun sequencing data by matching individual reads against a database of reference sequences. One major limitation of prior computational methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels. We propose that better classification results can be obtained by tuning the taxonomic classifier to each matching length, reference gene, and taxonomic level. We present a novel taxonomic classifier MetaPhyler ( http://metaphyler.cbcb.umd.edu ), which uses phylogenetic marker genes as a taxonomic reference. Results on simulated datasets demonstrate that MetaPhyler outperforms other tools commonly used in this context (CARMA, Megan and PhymmBL). We also present interesting results by analyzing a real metagenomic dataset. We have introduced a novel taxonomic classification method for analyzing the microbial diversity from whole-metagenome shotgun sequences. Compared with previous approaches, MetaPhyler is much more accurate in estimating the phylogenetic composition. In addition, we have shown that MetaPhyler can be used to guide the discovery of novel organisms from metagenomic samples.

Highlights

  • A major goal of metagenomics is to characterize the microbial composition of an environment

  • Approaches based on 16S rRNA sequencing, provide a biased estimate of microbial diversity due to the wide variability in copy number of the 16S gene even within closely related organisms (Figure 1a), and due to amplification biases inherent in PCR

  • Performance evaluation using simulated datasets Classification performance We carried out a simulated metagenomic study by comparing MetaPhyler with three other widely used tools: WebCarma [9], MEGAN [8] and PhymmBL [10]

Read more

Summary

Introduction

A major goal of metagenomics is to characterize the microbial composition of an environment. One fundamental goal in metagenomics is to characterize the taxonomic diversity of a microbial community - taxonomic profiling This is usually achieved by the targeted sequencing of the 16S rRNA gene, either as a whole, or focused on a hypervariable region within the gene [6]. The sequences are classified based on similarity against a curated reference 16S rRNA database [7] This approach has been a powerful research tool allowing biologists to explore the majority of previously unknown microorganisms populating our world. Approaches based on 16S rRNA sequencing, provide a biased estimate of microbial diversity due to the wide variability in copy number of the 16S gene even within closely related organisms (Figure 1a), and due to amplification biases inherent in PCR

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call