Water quality is a concern in most river basins worldwide due to the widespread release of pollutants which impacts the freshwater ecosystems. Exploring the relationships between drivers and water quality parameters at the regional scale is key in the identification of appropriate actions for the reduction of water pollution. Regional models are the appropriate tool to achieve this, though their development poses relevant challenges because of the complexity and non-linearity of such relationships. Among the available approaches, Machine Learning (ML) is promising because of its capability to detect complex nonlinear relationships and flexibility in the parameterization, which is learned from data. In this work, we developed regional models of water temperature, dissolved oxygen, arsenic, sulfate and chloride concentrations, as well as electrical conductivity, by using two ML algorithms, Random Forest and Deep feed-forward Neural Network, and compared their performances against the standard Linear Regression model. Our results indicate that the two ML algorithms are much more accurate models for such variables than the classical Linear Regression model, with Deep feed-forward Neural Network being the most effective in identifying the reciprocal importance of the drivers and capturing nonlinear relationships between drivers and water quality variables. Our analysis also revealed that the Julian day and year at which the sample was taken surrogate the air temperature in modeling water temperature and dissolved oxygen, with only a slight performance reduction. Arsenic, sulfate, and chloride show more complex behaviors in which geogenic and anthropogenic sources are intertwined. Dilution exerts a role chiefly for arsenic concentration, which suggests a non-uniform, in space, geogenic origin for this variable.
Read full abstract