Abstract

A fundamental goal of microbial ecology is to accurately determine the species composition in a given microbial ecosystem. In the context of the human microbiome, this is important for establishing links between microbial species and disease states. Here we benchmark the Microba Community Profiler (MCP) against other metagenomic classifiers using 140 moderate to complex in silico microbial communities and a standardized reference genome database. MCP generated accurate relative abundance estimates and made substantially fewer false positive predictions than other classifiers while retaining a high recall rate. We further demonstrated that the accuracy of species classification was substantially increased using the Microba Genome Database, which is more comprehensive than reference datasets used by other classifiers and illustrates the importance of including genomes of uncultured taxa in reference databases. Consequently, MCP classifies appreciably more reads than other classifiers when using their recommended reference databases. These results establish MCP as best-in-class with the ability to produce comprehensive and accurate species profiles of human gastrointestinal samples.

Highlights

  • Accurately establishing the composition of microbial communities from metagenomic data remains a challenge due to their complexity, the comparatively short read length of the most widely used sequencing technologies, and incomplete genome reference databases (Sczyrba et al, 2017; Ye et al, 2019). This latter limitation is being addressed by recent approaches that recover high-quality metagenome-assembled genomes (MAGs) from metagenomic datasets resulting in the availability of tens of thousands of draft genomes of uncultured taxa, most notably from the human gastrointestinal tract (Almeida et al, 2019; Nayfach et al, 2019; Pasolli et al, 2019)

  • We evaluated the performance of Microba Community Profiler (MCP) and nine publicly available metagenomic classifiers (Table 1), which use a variety of approaches and have previously been shown to be among the best performing classifiers (Lindgreen et al, 2016; Sczyrba et al, 2017; Ye et al, 2019; Seppey et al, 2020)

  • High-quality isolate genomes were included in the standardized reference database to ensure that classification performance would not be adversely impacted by low genome quality and to reflect that most classifiers recommend the use of reference databases comprised solely of complete isolate genomes

Read more

Summary

Introduction

Identifying the microbial species present in natural biological samples is essential for understanding their role in a range of applications including developing diagnostics and therapeutics for human health (Greenblum et al, 2012; Lloyd-Price et al, 2016; Gentile and Weir, 2018; Zmora et al, 2019), refining agricultural practices (Kennedy and Smith, 1995; Orellana et al, 2018), and gaining insights into biogeochemical cycles (Kuypers et al, 2018; Evans et al, 2019). Accurately establishing the composition of microbial communities from metagenomic data remains a challenge due to their complexity, the comparatively short read length of the most widely used sequencing technologies (typically 150–250 bp), and incomplete genome reference databases (Sczyrba et al, 2017; Ye et al, 2019) This latter limitation is being addressed by recent approaches that recover high-quality metagenome-assembled genomes (MAGs) from metagenomic datasets resulting in the availability of tens of thousands of draft genomes of uncultured taxa, most notably from the human gastrointestinal tract (Almeida et al, 2019; Nayfach et al, 2019; Pasolli et al, 2019)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call