As sediment measurements are laborious and costly, alternative techniques are required to provide such information from more easily measured variables. Thus, the objective of this study was to use machine learning-based models to predict the surface sediment concentration (SSC) in the Doce river basin. The cross-sectional averages of measurements from seven sediment monitoring stations of the Agência Nacional de Águas e Saneamento Básico located in the Doce riverbed were used as the SSC data. A total of 62 predictor variables were used, which were derived from data on the terrain slope, pedology, land use and cover, precipitation, river discharge and velocity, actual evapotranspiration, surface runoff, soil moisture, temperature, and normalized difference vegetation index. The Boruta and recursive feature elimination variable selection methods were employed to reduce the number of predictor variables. The random forest, Cubist, support vector machine, and eXtreme Gradient Boosting (XGBoost) algorithms as well as least absolute shrinkage and selection operator (LASSO) regression were applied to predict the SSC data. The machine learning algorithms provided superior results, particularly the Cubist and XGBoost models, which exhibited the lowest prediction error and highest efficiency metrics. According to the varImp function from Caret package, the most important predictor variables for the SSC modeling were the daily river discharge on the sediment collection date and time-lagged discharge. The cumulative daily mean precipitation was also important for the sediment modeling. Our findings demonstrate that machine learning models may be a very helpful tool for sediment monitoring and understanding sediment dynamics in the Doce river basin over time.
Read full abstract