Tracking metagenomic abundance in wastewater is undoubtedly a powerful tool to detect emerging variants and improve community health. However, there are a few factors that limit environmental water-based genomic monitoring: sampling variability, incomplete coverage, genetic fragmentation, degradation, data analysis and interpretation. The decreasing costs of high-throughput sequencing and high-end supercomputers have increased the use and accuracy of genomic data for microbial detection and monitoring in wastewater samples within any given region. To better understand the microbial dynamics and to determine the target sequencing throughput required to establish taxa that may pose as bio-indicators of an epidemiological outbreak, wastewater samples were collected from distinct locations within the Emirate of Abu Dhabi, United Arab Emirates using appropriate sampling methods. A reference database of ∼27,000 known species was developed and used for further analysis. The results showed that 15 % of data in each sample matched any of ∼27,000 known bacterial, viral, fungal, or protozoan species. Despite the high fraction of unclassified data (85 %), more than 2000 species from >800 genera across >30 phyla were detected in each sample. Both 5 Gb and 10 Gb of sequenced data detected the top ∼2000 species with highest abundance. Doubling the target sequencing throughput (i.e., 10 Gb vs 5 Gb) detected ∼500 additional low-abundance species per sample however it did not affect the overall sample composition or translate into higher per-sample species diversity captured. There was a marginal increase in the number of species detected in each sample beyond 0.20 Gb of classified data. Overall, the results indicate that sequencing to a 3 Gb throughput detects nearly 95 % of all species in the samples.
Read full abstract