Abstract

To operate wastewater treatment plants (WWTPs) with optimized efficiency, influent conditions (ICs) as initial states of inflow fed to WWTPs were monitored to identify potential anomalies that would trigger adverse events or system crash. To employ voluminous measurements for data-driven decisions, the non-linear, non-Gaussian, non-stationary, auto-correlated, cross-correlated, hetero-skedastic, case-specific nature of multivariate environmental datasets must be considered. This research proposed kernel machine learning models, the kernel principal components analysis based one-class support vector machine (KPCA-OCSVM) with various kernels, to learn anomaly-free training set then classify the testing set. A seven-years multivariate ICs time series was introduced with exploratory analysis performed to reveal temporal behaviors and statistical properties. KPCA with polynomial kernels sufficiently output representative features, based on which OCSVM with Gaussian kernels sensitively and specifically identified anomalies in ICs that were previously omitted by WWTP operators. The proposed kernel algorithms surpassed previous linear PCA-based K-nearest-neighbors models, and improved outcomes with limited increase in computation cost. Without requiring linear, Gaussian, stationary, independent, and homo-skedastic qualities from data, the proposed flexible environmental data science approach could be transferred, rebuilt, and tuned conveniently for ICs from different WWTPs.

Highlights

  • This paper proposed an effective monitoring strategy merging the desirable characteristics of Kernel PCA (KPCA) modeling with an unsupervised one-class Support vector machine (SVM) (OCSVM) scheme to distinguish normal from abnormal measurements

  • EXPLORATIVE ANALYSIS AND VISUALIZATION OF A HISTORICAL MULTIVARIATE influent conditions (ICs) DATASET To verify the hypothesis, one dataset from full-scale WWTP was engaged for training and testing

  • radial basis functions (RBF) KPCAs were poor with fewer principal components (PCs) but functioned better with more, which would involve intensive computation or potential overfitting and not suitable for online process monitoring or knowledge sharing among different WWTP ICs

Read more

Summary

INTRODUCTION

Control, and automation in WWTPs are producing quantities of multivariate time series data, which are often unexploited This ‘‘data-rich, information-poor’’ dilemma is attributed to the lack of methodology to select the right algorithm for a given case, the lack of standard prototypical data processing procedures, and the lack of trained environmental data scientist or data science expertise among environmental scientists [4]. This paper proposed an effective monitoring strategy merging the desirable characteristics of KPCA modeling with an unsupervised one-class SVM (OCSVM) scheme to distinguish normal from abnormal measurements. In this regards, KPCA was used to account for nonlinearities in the multivariate ICs data.

LINEAR AND KERNEL PRINCIPAL COMPONENTS ANALYSIS MODELS
RESULTS AND DISCUSSION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call