Abstract

BackgroundBinning 16S rRNA sequences into operational taxonomic units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut. Various methods have been developed, but most suffer from either inaccuracies or from being unable to handle millions of sequences generated in current studies. Furthermore, existing binning methods usually require a priori decisions regarding binning parameters such as a distance level for defining an OTU.ResultsWe present a novel modularity-based approach (M-pick) to address the aforementioned problems. The new method utilizes ideas from community detection in graphs, where sequences are viewed as vertices on a weighted graph, each pair of sequences is connected by an imaginary edge, and the similarity of a pair of sequences represents the weight of the edge. M-pick first generates a graph based on pairwise sequence distances and then applies a modularity-based community detection technique on the graph to generate OTUs to capture the community structures in sequence data. To compare the performance of M-pick with that of existing methods, specifically CROP and ESPRIT-Tree, sequence data from different hypervariable regions of 16S rRNA were used and binning results were compared.ConclusionsA new modularity-based clustering method for OTU picking of 16S rRNA sequences is developed in this study. The algorithm does not require a predetermined cut-off level, and our simulation studies suggest that it is superior to existing methods that require specified distance levels to define OTUs. The source code is available at http://plaza.ufl.edu/xywang/Mpick.htm.

Highlights

  • Binning 16S rRNA sequences into operational taxonomic units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut

  • Modularity-based methods are popular in community detection; they are derived from the intuition that a graph has community structure, if the number of edges within groups is significantly more than expected by chance [15,16]

  • In the context of OTU picking, a weighted graph is formed by: i) viewing sequences as vertices, where each pair of sequences is connected by an imaginary edge, and ii) viewing the simlarity of a pair of sequences as the weight on the edge connecting these two sequences

Read more

Summary

Introduction

Binning 16S rRNA sequences into operational taxonomic units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut. Existing binning methods usually require a priori decisions regarding binning parameters such as a distance level for defining an OTU. An initial requirement for downstream analyses of 16S rRNA sequences is the binning into operational taxonomic. One of the critical problems with existing TI methods is the need to set an appropriate distance threshold to retrieve the optimal OTU binning at a distinct taxonomic level such as species. Appropriate distance levels appear to vary depending on the chosen hypervariable region [1], which makes it impossible to create one single distance-based threshold for defining a taxonomic level [2]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call