A random forest regression (RFR) model was applied to over 12,000 wells with measured fluoride (F) concentrations in untreated groundwater to predict F concentrations at depths used for domestic and public supply in basin-fill aquifers of the western United States. The model relied on twenty-two regional-scale environmental and surficial predictor variables selected to represent factors known to control F concentrations in groundwater. The testing model fit R2 and RMSE were 0.52 and 0.78 mg/L. Comparisons of measured to predicted proportions of four F-concentrations categories (<0.7 mg/L, 0.7–2 mg/L, >2 mg/L – 4 mg/L, and > 4 mg/L) indicate that the model performed well at making regional-scale predictions. Differences between measured and predicted proportions indicate underprediction of measured F at values by between 4 and 20 mg/L, representing less than 1% of the regional scale predicted values. These residuals most often map to geographic regions where local-scale processes including evaporative discharge in closed basins or intermittent streams concentrate fluoride in shallow groundwater. Despite this, the RFR model provides spatially continuous F predictions across the basin-fill aquifers where discrete samples are missing. Further, the predictions capture documented areas that exceed the F maximum contaminant level for drinking water of 4 mg/L and areas that are below the oral-health benchmark of 0.7 mg/L. These predictions can be used to estimate fluoride concentrations in unmonitored areas and to aid in identifying geographic areas that may require further investigation at localized scales.
Read full abstract