Abstract

Data on the diffuse source annual flow weighted total phosphorus (TP) concentrations from 349 Danish streams draining smaller catchments (< 50 km2) for the period 1990-2019 were used for developing a model in machine learning software (DataRobot version 6.2; DataRobot Inc. Boston MA, USA). The developed diffuse source TP-concentration model will substitute an older model that have been in place to calculate P-loadings to Danish estuaries from ungauged areas. A total of 207 streams with 3,144 annual observations of flow-weighted TP concentrations together with information on 19 explanatory variables was entered into the DataRobot software. DataRobot divides the input data into three layers: Training dataset (64%), validation dataset (16%) and hold out dataset (20%). Thereafter, DataRobot conducts a five-layer cross-validation and tests among 72 different model types before suggesting final best solutions.In this case, the TP-concentration model was developed as an ‘eXtreme Gradient Boosted Trees Regressor with early stopping’ as suggested by the DataRobot software to be superior for modelling the annual flow-weighted TP concentration based on 13 explanatory variables. The most influencing explanatory variables in the final model are: 1) tile drainage in the catchments; 2) ; 3) period (two periods with different sampling regimes; 4) proportion of agricultural land; 5) importance of bank erosion; 6) deviation of annual runoff from long-term mean. The final TP-concentration model has a R2=0.69 for the training dataset, R2 = 0.71 for the validation dataset and R2 = 0.67 for the hold out dataset.A validation of the new machine learning TP-concentration model on 142 independent streams with 1,261 annual observations was conducted to investigate the uncertainty of the model simulations. The validation showed the TP-concentration model to have a high explanatory power (R2=0.60) and with a very good simulation performance in the nine Danish georegions, as well as for the 30 year long time series of data. An application of the model for calculating flow-weighted TP-concentrations within nearly 3,200 catchment polygons (ID15’s) covering the Danish land area showed that the new developed machine learning TP-model is a valuable tool both for calculation of TP-loadings from ungauged areas to lakes and coastal waters as well as for linking catchment pressures to stream ecological status.   

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call