Tree-aggregated predictive modeling of microbiome data

Jacob Bien,Léo Simpson,Xiaohan Yan,Christian L Müller

doi:10.1038/s41598-021-93645-3

Abstract

Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call trac (tree-aggregation of compositional data), learns data-adaptive taxon aggregation levels for predictive modeling, greatly reducing the need for user-defined aggregation in preprocessing while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbiome researchers gain insights into the structure and functioning of the underlying ecosystem of interest.

Highlights

Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale
At the most granular level, the data are summarized in count or relative abundance tables of operational taxonomic units (OTUs) at a prescribed sequence similarity level or denoised amplicon sequence variants (ASVs)[6]
To find a suitable aggregation level along the solution path, we use cross validation (CV) with mean squared error to select the regularization parameter ∈ [ min, max] for all the results presented in this paper

Summary

Introduction

Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. Recent advances in modern targeted amplicon and metagenomic sequencing technologies provide a cost effective means to get a glimpse into the complexity of natural microbial communities, ranging from marine and soil to host-associated ecosystems[3,4,5] Relating these large-scale observational microbial sequencing surveys to the structure and functioning of microbial ecosystems and the environments they inhabit has remained a formidable scientific challenge. OTU/ASV β1 β2 β3 β4 β5 β6 β7 β8 β9 β10 β11 β12 β13 β14 β15 β16

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Jul 15, 2021
Citations: 20	License type: open-access

R Discovery Prime

R Discovery Prime

Tree-aggregated predictive modeling of microbiome data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Explore mediated co-varying dynamics in microbial community using integrated local similarity and liquid association analysis
Dongmei Ai ... Jacob A Cram
BMC Genomics | VOL. 20
Dongmei Ai, et. al.Dongmei Ai ... Jacob A Cram
01 Apr 2019
BMC Genomics | VOL. 20

Life finds a way: the recovery of frog populations from a chytridiomycosis outbreak

-

01 Jan 2019
01 Jan 2019

Predictive analysis methods for human microbiome data with application to Parkinson's disease.
Mei Dong ... Anthony Kusalik
PLOS ONE | VOL. 15
Mei Dong, et. al.Mei Dong ... Anthony Kusalik
24 Aug 2020
PLOS ONE | VOL. 15

Microbial ecology might serve as new indicator for the influence of green tide on the coastal water quality: Assessment the bioturbation of Ulva prolifera outbreak on bacterial community in coastal waters
Jianhua Wang ... Jun Wu
Ecological Indicators | VOL. 113
Jianhua Wang, et. al.Jianhua Wang ... Jun Wu
20 Feb 2020
Ecological Indicators | VOL. 113

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tree-aggregated predictive modeling of microbiome data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports