Abstract

Background: Contamination of drinking water by nitrate is a growing problem in many agricultural areas of the country. Ingested nitrate can lead to the endogenous formation of N-nitroso compounds (NOC), which are potent animal carcinogens. Objective: The goal was to develop a predictive model for nitrate levels in private wells in Iowa for the Agricultural Health Study. Methods: We obtained 34,084 measurements of nitrate in private wells, along with well depth and location. We created a training set of approximately 30% of the observations (n = 11,940) with sampling stratified on decade of sample (1980, 1990, 2000) and bedrock status (in or above). We built random forest models to predict log nitrate levels in the training set by systematically assessing the predictive performance of 179 variables in 36 thematic groups (well depth, land use, soil characteristics, nitrogen inputs, meteorology, and other factors). Results: The final model contained 17 variable groups and 66 variables. The most important variable was well depth. Other important variables included slope length within 1 km of the well, year of sample, distance to nearest animal feeding operation, and fertilizer applications with 1 km of the well in 1978. The mean squared error in the validation samples in the training set was 2.56 and the pseudo R-square of the estimated log nitrate in the training set was 0.81. We predicted nitrate levels in the testing set (n = 22,153) with good agreement (r = 0.62) between reported nitrate levels and model predictions. The random forest model had substantially better predictive performance than a traditional linear regression model or a classification tree. Conclusion: Our results demonstrate that a random forest is useful for predicting nitrate levels in private wells. The random forest model predictions of log nitrate will be used to investigate the association between nitrate levels in drinking water and cancer risk in the Agricultural Health Study cohort.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call