Abstract

BackgroundClostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. As a result, members of Clostridiales and Bacteroidales are particularly suitable for identifying sources of fecal contamination in environmental samples. However, a comprehensive evaluation of their predictive power and development of computational approaches is lacking. Given the global public health concern for waterborne disease, accurate identification of fecal pollution sources is essential for effective risk assessment and management. Here, we use random forest algorithm and 16S rRNA gene amplicon sequences assigned to Clostridiales and Bacteroidales to identify common fecal pollution sources. We benchmarked the accuracy, consistency, and sensitivity of our classification approach using fecal, environmental, and artificial in silico generated samples.ResultsClostridiales and Bacteroidales classifiers were composed mainly of sequences that displayed differential distributions (host-preferred) among sewage, cow, deer, pig, cat, and dog sources. Each classifier correctly identified human and individual animal sources in approximately 90% of the fecal and environmental samples tested. Misclassifications resulted mostly from false-positive identification of cat and dog fecal signatures in host animals not used to build the classifiers, suggesting characterization of additional animals would improve accuracy. Random forest predictions were highly reproducible, reflecting the consistency of the bacterial signatures within each of the animal and sewage sources. Using in silico generated samples, we could detect fecal bacterial signatures when the source dataset accounted for as little as ~ 0.5% of the assemblage, with ~ 0.04% of the sequences matching the classifiers. Finally, we developed a proxy to estimate proportions among sources, which allowed us to determine which sources contribute the most to observed fecal pollution.ConclusionRandom forest classification with 16S rRNA gene amplicons offers a rapid, sensitive, and accurate solution for identifying host microbial signatures to detect human and animal fecal contamination in environmental samples.

Highlights

  • Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species

  • 48% of the V6 sequences were assigned to Clostridiales and 35% to Bacteroidales in the animal fecal samples

  • The minimum entropy decomposition (MED) analysis retained 90% of the total sequences, i.e., 21,965,364 Clostridiales sequences and 15,900,401 Bacteroidales sequences. These sequences were clustered into 2724 amplicon sequence variants (ASVs) for Clostridiales and 1479 Amplicon sequence variant (ASV) for Bacteroidales

Read more

Summary

Introduction

Clostridiales and Bacteroidales are uniquely adapted to the gut environment and have co-evolved with their hosts resulting in convergent microbiome patterns within mammalian species. SourceTracker could be used for fecal source identification, each new investigation requires all source and sink samples of interest to be re-analyzed de novo This setup requires investigators to either generate microbial source (e.g., human and animal fecal samples) sequence data or mine databases for appropriate information to pair with their environmental samples, decreasing its feasibility to be used widely. Random forest is a model that can handle unbalanced sample distributions and is less prone to overfitting, which produces unbiased classifiers [22] This machine learning approach has been used to classify body site, subject, and diagnoses using human microbiome datasets [23], but performance has not been evaluated for fecal source identification purposes

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call