Abstract

Monitoring groundwater (GW) quality is essential for the sustainable management of water resources to preserve public health and ecosystem functioning. The present study developed a machine learning (ML) modeling framework using high-throughput sequencing microbiome data as input variables, which successfully predicted the status and source of GW pollution. No systematic spatiotemporal patterns in the environmental parameters and community diversity indices were observed for the GW samples taken from a total petroleum hydrocarbon (TPH)-contaminated site. In contrast, the ML modeling optimized via model selection and hyperparameter tuning led to high prediction accuracy (>98 %) in classifying the status and source of GW pollution. Feature importance analysis using the ML models (logistic regression and support vector machine with radial basis function) identified members of Rhodocyclaceae, Syntrophaceae, and Helicobacteraceae as strong indicators of GW polluted with TPHs. The identification of these microbial taxa as pollution indicators was consistent with their known ecophysiology associated with TPH metabolism. The usefulness of these microbial indicators was then validated using both conventional hypothesis testing and phylogenetic analysis. Overall, the ML modeling pipeline established in this study using microbiome data provides new information on the interaction between a set of microbial biomarkers and enhances the predictive understanding of GW pollution and its bioremediation potential.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.