Deep Neural Network (DNN) is a powerful tool for predicting and monitoring water quality. However, its application is only limited to well-monitored zones where the availability of data for training and validation phases. In this study, we attempt to develop a novel framework based on Multivariate distributions (MVD) (elliptical copulas)-based Virtual Sample Generation (VSG) method to broaden the application of DNN to predict water quality even with a small dataset. This framework is evaluated to predict the Entropy Weighted Water Quality Index (EWQI) using DNN and Electrical Conductivity, Temperature, and pH as input variables, in Berrechid and Chaouia aquifer systems, Morocco. Validation results showed that the virtual samples generated from 400, 50, 30, and 20 original samples improved the NSE from 0.88 to 0.92, from 0.53 to 0.91, from 0.42 to 0.91, and from 0.24 to 0.87, respectively. Besides, sensitivity analysis of the methodology to the virtual data sizes and the original samples showed that the RMSE and NSE of the DNN models have limits in function to virtual data sizes according to the first order Exponential Decay and logistic trends, respectively. These limits highly depend on original sample sizes. Such empirical trends are crucial for reproducing the proposed methodology in other sites to determine optimal virtual datasets. Overall, the proposed methodology provided new insights to improve the DNN model performances in predicting water quality with small datasets. Hence, it is useful to manage water quality in order to supply clean water for the population in poorly monitored zones.
Read full abstract