Abstract
Microbial community profiling identifies and quantifies organisms in metagenomic sequencing data using either reference based or unsupervised approaches. However, current reference based profiling methods only report the presence and abundance of single reference genomes that are available in databases. Since only a small fraction of environmental genomes is represented in genomic databases, these approaches entail the risk of false identifications and often suggest a higher precision than justified by the data. Therefore, we developed MicrobeGPS, a novel metagenomic profiling approach that overcomes these limitations. MicrobeGPS is the first method that identifies microbiota in the sample and estimates their genomic distances to known reference genomes. With this strategy, MicrobeGPS identifies organisms down to the strain level and highlights possibly inaccurate identifications when the correct reference genome is missing. We demonstrate on three metagenomic datasets with different origin that our approach successfully avoids misleading interpretation of results and additionally provides more accurate results than current profiling methods. Our results indicate that MicrobeGPS can enable reference based taxonomic profiling of complex and less characterized microbial communities. MicrobeGPS is open source and available from https://sourceforge.net/projects/microbegps/ as source code and binary distribution for Windows and Linux operating systems.
Highlights
Recent advances in experimental and computational technologies have increased the number and diversity of sequenced microbial organisms
We present experimental results that demonstrate on the one hand that MicrobeGPS provides more accurate community composition estimates than previous approaches and on the other hand that MicrobeGPS provides a new quality in analyzing microbial communities
We analyzed the community composition of Mock Community (MC) with the state-of-the-art methods Pathoscope [5], MetaPhlAn [4], and mOTUs [10] and compared results with MicrobeGPS. Both MicrobeGPS and Pathoscope build upon the alignment of the metagenomic reads to a database of microbial reference genomes
Summary
Recent advances in experimental and computational technologies have increased the number and diversity of sequenced microbial organisms. Single cell sequencing [1, 2] and metagenome assembly [3] allow extracting the genomic sequences even for uncultivable bacteria. With the increasing number of microbial reference sequences, reference based metagenomic analysis methods became significantly more powerful and popular [4,5,6,7]. The taxonomic resolution of reference based methods in whole genome sequencing metagenomic experiments is higher than for other strategies such as 16S rRNA [8] or composition based taxonomic profiling [9], these methods encounter a different problem: the reference genome databases are still far from complete and—due to constant evolution—will never be. The often proposed species or strain level accuracy [5] is only achieved if PLOS ONE | DOI:10.1371/journal.pone.0117711 February 2, 2015
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.