This study aimed to investigate the performance of decision tree-based models for water quantity and quality prediction. The models adopted for performance assessment included decision tree (DT), random forest (RF), and extreme gradient boosting (XGB), which was fed by the data sets collected from two monitoring stations in the Nakdong River during 2018-2021. A 7:3 ratio was used to prepare training and testing sets for three prediction models and their hyperparmeters were tuned to improve the accuracy of prediction. We found that XGB which was not sensitive to input data resolution outperformed the other two models, DT and RF. In contrast, the prediction error for DT model decreased progressively in response to increasing monitoring frequency from 7 through 3 to 1 day as well as after applying post-pruning, regardless of dependent variables. When the accuracy of prediction for RF model was assessed as a function of the number of independent variables, more than 4 variables was effective in maintaining its prediction performance as compared to all variables adopted. Therefore, both monitoring frequency and pruning play an important role in reducing the prediction error of decision tree models, in addition to hyperparameter optimization.
Read full abstract