Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites

Janko Tackmann,João Frederico Matias Rodrigues,Christian Von Mering,Thomas Sebastian Benedikt Schmidt,Natasha Arora

doi:10.1186/s40168-018-0565-6

Janko Tackmann, João Frederico Matias Rodrigues + Show 3 more

Open Access

https://doi.org/10.1186/s40168-018-0565-6

Copy DOI

Abstract

BackgroundThe identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics. Previous studies have characterized site-specific microbiota and shown that sample origin can be accurately predicted by microbial content. However, these studies were usually restricted to single datasets with consistent experimental methods and conditions, as well as comparatively small sample numbers. The effects of study-specific biases and statistical power on classification performance and biomarker identification thus remain poorly understood. Furthermore, reliable detection in mixtures of different body sites or with noise from environmental contamination has rarely been investigated thus far. Finally, the impact of ecological associations between microbes on biomarker discovery was usually not considered in previous work.ResultsHere we present the analysis of one of the largest cross-study sequencing datasets of microbial communities from human body sites (15,082 samples from 57 publicly available studies). We show that training a Random Forest Classifier on this aggregated dataset increases prediction performance for body sites by 35% compared to a single-study classifier. Using simulated datasets, we further demonstrate that the source of different microbial contributions in mixtures of different body sites or with soil can be detected starting at 1% of the total microbial community. We apply a biomarker selection method that excludes indirect environmental associations driven by microbe-microbe associations, yielding a parsimonious set of highly predictive taxa including novel biomarkers and excluding many previously reported taxa. We find a considerable fraction of unclassified biomarkers (“microbial dark matter”) and observe that negatively associated taxa have a surprisingly high impact on classification performance. We further detect a significant enrichment of rod-shaped, motile, and sporulating taxa for feces biomarkers, consistent with a highly competitive environment.ConclusionsOur machine learning model shows strong body site classification performance, both in single-source samples and mixtures, making it promising for tasks requiring high accuracy, such as forensic applications. We report a core set of ecologically informed biomarkers, inferred across a wide range of experimental protocols and conditions, providing the most concise, general, and least biased overview of body site-associated microbes to date.

Highlights

The identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics
We found that 91.4% of the 1397 predictive Operational taxonomic unit (OTU) reported by Random Forest Classifier (RFC)-single were supported by RFC-global
Even trace amounts of body site microbiomes can be reliably identified in mixtures between body sites or body site and environment we evaluated detection limits and prediction performance of RFC-global on in silico mixtures of different body sites

Summary

Introduction

The identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics. Disease states can be associated with complex microbial patterns, making suitable biomarker detection and classification methods crucial for accurate disease identification and prediction [1,2,3,4] Another important application is the identification of the source of origin for a sample, for example in forensic cases [5] or environmental monitoring [6]. In the former, reliably determining the bodily source of a stain at a crime scene (e.g., saliva, semen, vaginal fluid, blood) can critically aid the reconstruction of crime events. Establishing the source of microbial communities has an added complexity when dealing with mixtures, such as in contaminated samples, as these require the distinction of two or more different sources

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Microbiome	Publication Date: Oct 24, 2018
Citations: 31	License type: open-access

R Discovery Prime

R Discovery Prime

Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbiome

Lead the way for us

Similar Papers

Diversity and dynamics of bacteriocins from human microbiome.
Jinshui Zheng ... Xiaoxi B Lin
Environmental Microbiology | VOL. 17
Jinshui Zheng, et. al.Jinshui Zheng ... Xiaoxi B Lin
17 Dec 2014
Environmental Microbiology | VOL. 17

MBodyMap: a curated database for microbes across human body and their associations with health and diseases.
Hanbo Jin ... Yiqian Duan
Nucleic Acids Research | VOL. 50
Hanbo Jin, et. al.Hanbo Jin ... Yiqian Duan
28 Oct 2021
Nucleic Acids Research | VOL. 50

Microbial Chemical Ecology in the Human Microbiome
Jared N Balaich ... Mohamed S Donia
Reference Module in Chemistry, Molecular Sciences and Chemical Engineering | VOL. -
Jared N Balaich, et. al.Jared N Balaich ... Mohamed S Donia
01 Jan 2020
Reference Module in Chemistry, Molecular Sciences and Chemical Engineering | VOL. -

Genomic diversity of genus Limosilactobacillus.
Magdalena Ksiezarek ... Filipa Grosso
Microbial Genomics | VOL. 8
Magdalena Ksiezarek, et. al.Magdalena Ksiezarek ... Filipa Grosso
15 Jul 2022
Microbial Genomics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbiome