Big data have become increasingly important for policymakers and scientists but have yet to be employed for the development of spatially specific groundwater contamination indices or protecting human and environmental health. The current study sought to develop a series of indices via analyses of three variables: Non-E. coli coliform (NEC) concentration, E. coli concentration, and the calculated NEC:E. coli concentration ratio. A large microbial water quality dataset comprising 1,104,094 samples collected from 292,638 Ontarian wells between 2010 and 2021 was used. Getis-Ord Gi* (Gi*), Local Moran's I (LMI), and space-time scanning were employed for index development based on identified cluster recurrence. Gi* and LMI identify hot and cold spots, i.e., spatially proximal subregions with similarly high or low contamination magnitudes. Indices were statistically compared with mapped well density and age-adjusted enteric infection rates (i.e., campylobacteriosis, cryptosporidiosis, giardiasis, verotoxigenic E. coli (VTEC) enteritis) at a subregional (N = 298) resolution for evaluation and final index selection. Findings suggest that index development via Gi* represented the most efficacious approach. Developed Gi* indices exhibited no correlation with well density, implying that indices are not biased by rural population density. Gi* indices exhibited positive correlations with mapped infection rates, and were particularly associated with higher bacterial (Campylobacter, VTEC) infection rates among younger sub-populations (p < 0.05). Conversely, no association was found between developed indices and giardiasis rates, an infection not typically associated with private groundwater contamination. Findings suggest that a notable proportion of bacterial infections are associated with groundwater and that the developed Gi* index represents an appropriate spatiotemporal reflection of long-term groundwater quality. Bacterial infection correlations with the NEC:E. coli ratio index (p < 0.001) were markedly different compared to correlations with the E. coli index, implying that the ratio may supplement E. coli monitoring as a groundwater assessment metric capable of elucidating contamination mechanisms. This study may serve as a methodological blueprint for the development of big data-based groundwater contamination indices across the globe.
Read full abstract