Abstract

The recently completed second phase of the Human Microbiome Project has highlighted the relationship between dynamic changes in the microbiome and disease, motivating new microbiome study designs based on longitudinal sampling. Yet, analysis of such data is hindered by presence of technical noise, high dimensionality, and data sparsity. Here, we introduce LUMINATE (longitudinal microbiome inference and zero detection), a fast and accurate method for inferring relative abundances from noisy read count data. We demonstrate that LUMINATE is orders of magnitude faster than current approaches, with better or similar accuracy. We further show that LUMINATE can accurately distinguish biological zeros, when a taxon is absent from the community, from technical zeros, when a taxon is below the detection threshold. We conclude by demonstrating the utility of LUMINATE on a real dataset, showing that LUMINATE smooths trajectories observed from noisy data. LUMINATE is freely available from https://github.com/tyjo/luminate.

Highlights

  • The human body is home to trillions of microbial cells that play an essential role in health and disease (Cho and Blaser, 2012)

  • Taxa near the detection threshold may fail to appear in a sample, necessitating a distinction between a biological zero—where a taxon is absent in the community—from a technical zero where it drops below the detection threshold (Aijoet al., 2018)

  • Design of Simulations to Evaluate Model Performance Ground-truth relative abundances are required for evaluating model performance

Read more

Summary

Introduction

Background The human body is home to trillions of microbial cells that play an essential role in health and disease (Cho and Blaser, 2012). Investigating the human microbiome can provide insight into biological processes and the etiology of disease. A major paradigm for microbiome studies uses targeted amplicon sequencing of the 16S rRNA gene to produce read counts of each bacterial taxon in a sample (Kuczynski et al, 2011). Technical noise, such as uneven amplification during PCR, can produce read counts that differ substantially from the underlying community structure (Kuczynski et al, 2011). Taxa near the detection threshold may fail to appear in a sample, necessitating a distinction between a biological zero—where a taxon is absent in the community—from a technical zero where it drops below the detection threshold (Aijoet al., 2018). The number of taxa and time points in a sample may be large, requiring methods that scale to high dimensional data

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call