ABSTRACTUnconventional oil and gas (UOG) extraction is increasing exponentially around the world, as new technological advances have provided cost-effective methods to extract hard-to-reach hydrocarbons. While UOG has increased the energy output of some countries, past research indicates potential impacts in nearby stream ecosystems as measured by geochemical and microbial markers. Here, we utilized a robust data set that combines 16S rRNA gene amplicon sequencing (DNA), metatranscriptomics (RNA), geochemistry, and trace element analyses to establish the impact of UOG activity in 21 sites in northern Pennsylvania. These data were also used to design predictive machine learning models to determine the UOG impact on streams. We identified multiple biomarkers of UOG activity and contributors of antimicrobial resistance within the order Burkholderiales. Furthermore, we identified expressed antimicrobial resistance genes, land coverage, geochemistry, and specific microbes as strong predictors of UOG status. Of the predictive models constructed (n = 30), 15 had accuracies higher than expected by chance and area under the curve values above 0.70. The supervised random forest models with the highest accuracy were constructed with 16S rRNA gene profiles, metatranscriptomics active microbial composition, metatranscriptomics active antimicrobial resistance genes, land coverage, and geochemistry (n = 23). The models identified the most important features within those data sets for classifying UOG status. These findings identified specific shifts in gene presence and expression, as well as geochemical measures, that can be used to build robust models to identify impacts of UOG development.IMPORTANCE The environmental implications of unconventional oil and gas extraction are only recently starting to be systematically recorded. Our research shows the utility of microbial communities paired with geochemical markers to build strong predictive random forest models of unconventional oil and gas activity and the identification of key biomarkers. Microbial communities, their transcribed genes, and key biomarkers can be used as sentinels of environmental changes. Slight changes in microbial function and composition can be detected before chemical markers of contamination. Potential contamination, specifically from biocides, is especially concerning due to its potential to promote antibiotic resistance in the environment. Additionally, as microbial communities facilitate the bulk of nutrient cycling in the environment, small changes may have long-term repercussions. Supervised random forest models can be used to identify changes in those communities, greatly enhance our understanding of what such impacts entail, and inform environmental management decisions.
Read full abstract