Improved detection of disease-associated gut microbes using 16S sequence-based biomarkers

Brianna S Chrisman,Nate Stockham,Todd Z Desantis,Dennis P Wall,Maude David,Jae-Yoon Jung,Christine Tataru,Kelley M Paskov,Peter Y Washington,Shoko Iwai,Maya Varma

doi:10.1186/s12859-021-04427-7

Abstract

BackgroundSequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and Microphenoor DiTaxa features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction.ResultsOn two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR <.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. Porphyromonadaceae, Ruminococcaceae, and an unnamed species of Blastocystis were significantly enriched in autism, while Veillonellaceae was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR<.1). We observed Megasphaera andSutterellaceae highly enriched in obesity, and Phocaeicola significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84.ConclusionsSBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from http://github.com/briannachrisman/16s_biomarkers.

Highlights

Sequencing partial 16S rRNA genes is a cost effective method for quan‐ tifying the microbial composition of an environment, such as the human gut
sequence-based biomarker (SBB) provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction
Multi‐loci SBBs yield high statistical power in identifying differentially abundant microbial groups Using sequence-based biomarkers, we were able to identify groups of microbes differentially enriched in autism as well as in obesity with high statistical power

Summary

Introduction

Sequencing partial 16S rRNA genes is a cost effective method for quan‐ tifying the microbial composition of an environment, such as the human gut. Downstream analysis relies on binning reads into microbial groups by either consider‐ ing each unique sequence as a different microbe, querying a database to get taxo‐ nomic labels from sequences, or clustering similar sequences together These approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. Microbial community profiling by 16S sequencing (Fig. 1) involves designing primers to target conserved sequences around a hypervariable region of choice, amplifying the region from a mixture of diverse genomes, and performing short read sequencing of the amplicons These exact sequencing variants (ESVs) are preprocessed to remove noise and sequencing artifacts in order Amplicon Sequence Variants (ASV) [6]. The final output of this pipeline is a matrix of read counts for each ASV

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Oct 19, 2021
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Improved detection of disease-associated gut microbes using 16S sequence-based biomarkers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Early Life Stress in Mice Alters Microbial Composition
Keri M Kemp ... Craig L Maynard
FASEB journal : official publication of the Federation of American Societies for Experimental Biology | VOL. 34
Keri M Kemp, et. al.Keri M Kemp ... Craig L Maynard
01 Apr 2020
FASEB journal : official publication of the Federation of American Societies for Experimental Biology | VOL. 34

Influence of gastrectomy for gastric cancer treatment on faecal microbiome and metabolome profiles
Pande Putu Erawijantari ... Sayaka Mizutani
Gut | VOL. 69
Pande Putu Erawijantari, et. al.Pande Putu Erawijantari ... Sayaka Mizutani
16 Jan 2020
Gut | VOL. 69

KRAKEN results: taxonomic abundance tables and discriminant analysis

-

01 Jan 2017
01 Jan 2017

Biomarkers of antibiotic resistance genes during seasonal changes in wastewater treatment systems.
Ya-Nan Jiao ... Tao Chen
Environmental Pollution. Series A, Ecological and Biological | VOL. 234
Ya-Nan Jiao, et. al.Ya-Nan Jiao ... Tao Chen
21 Nov 2017
Environmental Pollution. Series A, Ecological and Biological | VOL. 234

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improved detection of disease-associated gut microbes using 16S sequence-based biomarkers

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics