Abstract

BackgroundFormation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database.ResultsHere we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure.ConclusionsmPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.

Highlights

  • Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets

  • To validate the primary function of microbial Profiling Using Metagenomic Assembly (mPUMA) (OTU formation and abundance calculation), we tested its performance in the analysis of sequence data generated by amplification and sequencing of Cpn60 universal target sequences from a synthetic community containing cloned Cpn60 universal target sequences from 20 human vaginal bacteria with pairwise sequence identity values of 60 to 96% [27]

  • OTU formation and abundance calculations were performed on the dataset using all three options available within the mPUMA pipeline and the resulting microbial profiles were evaluated for number of OTU generated, number of reads unmapped, amount of total error generated and comparison of the profile to the known 'Target' synthetic community profile (Figure 2)

Read more

Summary

Introduction

Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. Representative sequences selected from the experimental data may not include full-length coverage of the target, depending on its length This in turn limits information content, and the ability to conduct multiple sequence alignments and phylogenetic analysis for characterization of novel OTU sequences. A limitation common to all of these approaches is apparent when the community under study contains novel sequences not represented in reference databases. In these cases, novel sequences in the experimental data may be ignored or pooled together as 'unclassified' since they do not closely resemble the reference sequences. The end result is that the aggregated description of the community may not reflect the input sequence data generated in the experiment

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call