Abstract
Data on the diffuse source annual flow weighted total phosphorus (TP) concentrations from 349 Danish streams draining smaller catchments (< 50 km2) for the period 1990-2019 were used for developing a model in machine learning software (DataRobot version 6.2; DataRobot Inc. Boston MA, USA). The developed diffuse source TP-concentration model will substitute an older model that have been in place to calculate P-loadings to Danish estuaries from ungauged areas. A total of 207 streams with 3,144 annual observations of flow-weighted TP concentrations together with information on 19 explanatory variables was entered into the DataRobot software. DataRobot divides the input data into three layers: Training dataset (64%), validation dataset (16%) and hold out dataset (20%). Thereafter, DataRobot conducts a five-layer cross-validation and tests among 72 different model types before suggesting final best solutions.In this case, the TP-concentration model was developed as an &#8216;eXtreme Gradient Boosted Trees Regressor with early stopping&#8217; as suggested by the DataRobot software to be superior for modelling the annual flow-weighted TP concentration based on 13 explanatory variables. The most influencing explanatory variables in the final model are: 1) tile drainage in the catchments; 2) ; 3) period (two periods with different sampling regimes; 4) proportion of agricultural land; 5) importance of bank erosion; 6) deviation of annual runoff from long-term mean. The final TP-concentration model has a R2=0.69 for the training dataset, R2 = 0.71 for the validation dataset and R2 = 0.67 for the hold out dataset.A validation of the new machine learning TP-concentration model on 142 independent streams with 1,261 annual observations was conducted to investigate the uncertainty of the model simulations. The validation showed the TP-concentration model to have a high explanatory power (R2=0.60) and with a very good simulation performance in the nine Danish georegions, as well as for the 30 year long time series of data.&#160;An application of the model for calculating flow-weighted TP-concentrations within nearly 3,200 catchment polygons (ID15&#8217;s) covering the Danish land area showed that the new developed machine learning TP-model is a valuable tool both for calculation of TP-loadings from ungauged areas to lakes and coastal waters as well as for linking catchment pressures to stream ecological status.&#160;&#160;&#160;
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have