Abstract

Persistent organic pollutants (POPs) are highly toxic and difficult to degrade in the natural ecology, which has a severe negative impact on the ecological environment. Quantifying changes in the concentrations of persistent organic pollutants in the Great Lakes is challenging work. Machine learning (ML) methods are potent predictors that have recently achieved impressive performance on time series tasks. ARIMA model, Linear Regression methods, XGBoost algorithm, and Long Short-Term Memory (LSTM) are commonly used for estimating time-series changes. Traditionally Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) have been standard criteria to measure the error between the actual value and predicted value; however, Euclidean distance (ED) cannot effectively calculate the similarity between two-time series. We proposed an alternative criterion called Penalty Dynamic Time Wrapping (Penalty-DTW) based on Dynamic Time Wrapping (DTW). It can accurately measure the difference between the actual value and the predicted value. We study the benefits of Penalty-DTW vs. ED under the above ML algorithms. Further, considering the machine learning algorithm’s uncertainty, we proposed combining LSTM and deep ensemble methods to quantify algorithms uncertainty and make a confident prediction. We find improved LSTM model outperformed other predictive power models by comparing pollutant concentration prediction. The prediction results show that the concentration of pollutants has a stable downward trend in recent years. Simultaneously, we found that pollutants’ concentration correlates with seasons, which positively guides environmental pollution control in the Great Lakes.

Highlights

  • The Great Lakes of North America are a series of large interconnected freshwater lakes and are generally on or near the Canada-United States border

  • PENALTY-Dynamic Time Wrapping (DTW) METRIC 1) COMPARISON OF EUCLIDEAN DISTANCE AND DTW Time series cluster is typical verification to measure the similarity of different time series

  • We observe that increasing the number of Long Short-Term Memory (LSTM) units significantly improves the performance in terms of Root Mean Squared Error (RMSE) score and Penalty-DTW score

Read more

Summary

Introduction

The Great Lakes of North America are a series of large interconnected freshwater lakes and are generally on or near the Canada-United States border. They are lakes Superior, Michigan, Huron, Erie, and Ontario. Eriksen et al [2], and Mason et al [3] confirmed the existence of microplastics in Great Lakes’ surface water. Baldwin et al [4] found the presence of plastics in the Great Lakes’ water. These microplastics contain many chemicals, and persistent organic pollutants (POPs) are one of them

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call