A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modelling

Adam P Piotrowski,Jarosław J Napiorkowski

doi:10.1016/j.jhydrol.2012.10.019

Abstract

Summary Artificial neural networks (ANNs) becomes very popular tool in hydrology, especially in rainfall–runoff modelling. However, a number of issues should be addressed to apply this technique to a particular problem in an efficient way, including selection of network type, its architecture, proper optimization algorithm and a method to deal with overfitting of the data. The present paper addresses the last, rarely considered issue, namely comparison of methods to prevent multi-layer perceptron neural networks from overfitting of the training data in the case of daily catchment runoff modelling. Among a number of methods to avoid overfitting the early stopping, the noise injection and the weight decay have been known for about two decades, however only the first one is frequently applied in practice. Recently a new methodology called optimized approximation algorithm has been proposed in the literature. Overfitting of the training data leads to deterioration of generalization properties of the model and results in its untrustworthy performance when applied to novel measurements. Hence the purpose of the methods to avoid overfitting is somehow contradictory to the goal of optimization algorithms, which aims at finding the best possible solution in parameter space according to pre-defined objective function and available data. Moreover, different optimization algorithms may perform better for simpler or larger ANN architectures. This suggest the importance of proper coupling of different optimization algorithms, ANN architectures and methods to avoid overfitting of real-world data – an issue that is also studied in details in the present paper. The study is performed for Annapolis River catchment, characterized by significant seasonal changes in runoff, rapid floods during winter and spring, moderately dry summers, severe winters with snowfall, snow melting, frequent freeze and thaw, and presence of river ice. The present paper shows that the elaborated noise injection method may prevent overfitting slightly better than the most popular early stopping approach. However, the implementation of noise injection to real-world problems is difficult and the final model performance depends significantly on a number of very technical details, what somehow limits its practical applicability. It is shown that optimized approximation algorithm does not improve the results obtained by older methods, possibly due to over-simplified criterion of stopping the algorithm. Extensive calculations reveal that Evolutionary Computation-based algorithm performs better for simpler ANN architectures, whereas classical gradient-based Levenberg–Marquardt algorithm is able to benefit from additional input variables, representing precipitation and snow cover from one more previous day, and from more complicated ANN architectures. This confirms that the curse of dimensionality has severe impact on the performance of Evolutionary Computing methods.

Full Text