Accurately assessing and managing risks associated with inorganic pollutants in groundwater is imperative. Historic water quality databases are often sparse due to rationale or financial budgets for sample collection and analysis, posing challenges in evaluating exposure or water treatment effectiveness. We utilized and compared two advanced multiple data imputation techniques, AMELIA and MICE algorithms, to fill gaps in sparse groundwater quality data sets. AMELIA outperformed MICE in handling missing values, as MICE tended to overestimate certain values, resulting in more outliers. Field data sets revealed that 75% to 80% of samples exhibited no co-occurring regulated pollutants surpassing MCL values, whereas imputed values showed only 15% to 55% of the samples posed no health risks. Imputed data unveiled a significant increase, ranging from 2 to 5 times, in the number of sampling locations predicted to potentially exceed health-based limits and identified samples where 2 to 6 co-occurring chemicals may occur and surpass health-based levels. Linking imputed data to sampling locations can pinpoint potential hotspots of elevated chemical levels and guide optimal resource allocation for additional field sampling and chemical analysis. With this approach, further analysis of complete data sets allows state agencies authorized to conduct groundwater monitoring, often with limited financial resources, to prioritize sampling locations and chemicals to be tested. Given existing data and time constraints, it is crucial to identify the most strategic use of the available resources to address data gaps effectively. This work establishes a framework to enhance the beneficial impact of funding groundwater data collection by reducing uncertainty in prioritizing future sampling locations and chemical analyses.
Read full abstract