Abstract

Changes in hydrological characteristics and increases in various pollutant loadings due to rapid climate change and urbanization have a significant impact on the deterioration of aquatic ecosystem health (AEH). Therefore, it is important to effectively evaluate the AEH in advance and establish appropriate strategic plans. Recently, machine learning (ML) models have been widely used to solve hydrological and environmental problems in various fields. However, in general, collecting sufficient data for ML training is time-consuming and labor-intensive. Especially in classification problems, data imbalance can lead to erroneous prediction results of ML models. In this study, we proposed a method to solve the data imbalance problem through data augmentation based on Wasserstein Generative Adversarial Network (WGAN) and to efficiently predict the grades (from A to E grades) of AEH indices (i.e., Benthic Macroinvertebrate Index (BMI), Trophic Diatom Index (TDI), Fish Assessment Index (FAI)) through the ML models. Raw datasets for the AEH indices composed of various physicochemical factors (i.e., WT, DO, BOD5, SS, TN, TP, and Flow) and AEH grades were built and augmented through the WGAN. The performance of each ML model was evaluated through a 10-fold cross-validation (CV), and the performances of the ML models trained on the raw and WGAN-based training sets were compared and analyzed through AEH grade prediction on the test sets. The results showed that the ML models trained on the WGAN-based training set had an average F1-score for grades of each AEH index of 0.9 or greater for the test set, which was superior to the models trained on the raw training set (fewer data compared to other datasets) only. Through the above results, it was confirmed that by using the dataset augmented through WGAN, the ML model can yield better AEH grade predictive performance compared to the model trained on limited datasets; this approach reduces the effort needed for actual data collection from rivers which requires enormous time and cost. In the future, the results of this study can be used as basic data to construct big data of aquatic ecosystems, needed to efficiently evaluate and predict AEH in rivers based on the ML models.

Highlights

  • The main objectives of this study are (i) to evaluate the applicability of Wasserstein Generative Adversarial Network (WGAN) to solve the imbalanced distribution of data samples required for predicting the grades of the aquatic ecosystem health (AEH) indices in classification field; (ii) to propose a method for building the machine learning (ML) model to effectively evaluate the grades of each AEH index by considering multiple physicochemical factors such as flow, water quality, and water temperature

  • In the WGAN training process, we found that the loss function of the discriminator showed large fluctuations in the initial stage (Figure 7)

  • We found that the ML model trained on the WGAN-based training set outperformed the model trained on the raw training set with an

Read more

Summary

Introduction

The results of the AEH evaluation based on these indices can be efficiently used as basic data for selecting impaired rivers that require aquatic ecosystem management and establishing plans for river ecosystem restoration. Very few studies have been conducted to evaluate the applicability of the WGAN for augmenting standardized data samples in hydrology and aquatic ecology fields and its effect on the performance of ML model training and test. The main objectives of this study are (i) to evaluate the applicability of WGAN to solve the imbalanced distribution of data samples required for predicting the grades of the AEH indices in classification field; (ii) to propose a method for building the ML model to effectively evaluate the grades of each AEH index by considering multiple physicochemical factors such as flow, water quality, and water temperature.

Description of the the Study
ML Models Building and Evaluation Process
The Performance Evaluation Metrics of ML Models
Correlation Analysis Results
The results showed that
Correlation Analysis and WGAN-Based Data Augmentation Results
Comparison of Validation Results of ML Models
Comparison
Grade Prediction of Each AEH Index for Test Set Using the ML Models
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call