Application of machine learning approaches in predicting estuarine dissolved oxygen (DO) under a limited data environment

Mohammad Abu Zafer Siddik

doi:10.2166/wqrj.2022.002

Abstract

Abstract The application of machine learning (ML) approaches to predict estuarine dissolved oxygen (DO) from a set of environmental covariates including nutrients remains unexplored due to nutrient data unavailability. Employing data from 12 southwest coastal Florida water quality stations, the applicability of four ML models – support vector machine (SVM), random forest (RF), decision tree, and Wang–Mendel – was examined in predicting DO under a limited nutrient data environment. Monthly water temperature, pH, salinity, total nitrogen (TN), and total phosphorus (TP) data were used for model development. The multiple linear regression model was trained as benchmarks to compare the ML model performances. The site-specific RF and SVM showed superior model efficiency (Nash–Sutcliffe Efficiency &gt; 0.80) when all the predictor variables were used for model development. However, models trained without nutrients demonstrated reduced prediction accuracy. Modeling by synthesizing all site data under TN-limited, TP-limited, and TN- & TP-co-limited regimes illustrated a preferable performance of RF. Overall, the study rendered two crucial conclusions that could complement the existing approaches to estimate total daily loads for environmental management: (1) nutrients serve as a necessary predictor of estuarine DO dynamics and (2) RF performs better among the ML methods under a limited data environment.

Full Text