Abstract

Metagenomic profiling, predicting the presence and relative abundances of microbes in a sample, is a critical first step in microbiome analysis. Alignment-based approaches are often considered accurate yet computationally infeasible. Here, we present a novel method, Metalign, that performs efficient and accurate alignment-based metagenomic profiling. We use a novel containment min hash approach to pre-filter the reference database prior to alignment and then process both uniquely aligned and multi-aligned reads to produce accurate abundance estimates. In performance evaluations on both real and simulated datasets, Metalign is the only method evaluated that maintained high performance and competitive running time across all datasets.

Highlights

  • Microorganisms are ubiquitous in almost every natural setting, including soil [1], ocean water [2], and the human body [3], and they play critical roles in the functioning of each of these systems [4, 5]

  • Traditional culture-based analysis of these microbes is confounded by the presence of many microorganisms that cannot be cultured in standard laboratory settings [4, 6]

  • Metalign’s pre-filtering technique will continue to enable alignment-based metagenomic analysis techniques even as reference databases continue to grow in size

Read more

Summary

Introduction

Microorganisms are ubiquitous in almost every natural setting, including soil [1], ocean water [2], and the human body [3], and they play critical roles in the functioning of each of these systems [4, 5]. Traditional culture-based analysis of these microbes is confounded by the presence of many microorganisms that cannot be cultured in standard laboratory settings [4, 6]. The field of metagenomics, or the analysis of whole microbial genomes recovered directly from their host environment via high-throughput sequencing, is vital to understanding microbial communities and their functions [4, 5]. Predicting the presence and relative abundance of taxa in a metagenomic sample (referred to as “taxonomic profiling”) is one of the primary means of analyzing a metagenomic sample [7, 8]. In comparison with metagenomic assembly, profiling is computationally simpler and more effective at identifying low-abundance organisms [8]. Metagenomic profiles can be obtained through read classification (where individual reads are assigned to taxa or organisms) or via the closely related technique of read

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call