Abstract
To operate wastewater treatment plants (WWTPs) with optimized efficiency, influent conditions (ICs) as initial states of inflow fed to WWTPs were monitored to identify potential anomalies that would trigger adverse events or system crash. To employ voluminous measurements for data-driven decisions, the non-linear, non-Gaussian, non-stationary, auto-correlated, cross-correlated, hetero-skedastic, case-specific nature of multivariate environmental datasets must be considered. This research proposed kernel machine learning models, the kernel principal components analysis based one-class support vector machine (KPCA-OCSVM) with various kernels, to learn anomaly-free training set then classify the testing set. A seven-years multivariate ICs time series was introduced with exploratory analysis performed to reveal temporal behaviors and statistical properties. KPCA with polynomial kernels sufficiently output representative features, based on which OCSVM with Gaussian kernels sensitively and specifically identified anomalies in ICs that were previously omitted by WWTP operators. The proposed kernel algorithms surpassed previous linear PCA-based K-nearest-neighbors models, and improved outcomes with limited increase in computation cost. Without requiring linear, Gaussian, stationary, independent, and homo-skedastic qualities from data, the proposed flexible environmental data science approach could be transferred, rebuilt, and tuned conveniently for ICs from different WWTPs.
Highlights
This paper proposed an effective monitoring strategy merging the desirable characteristics of Kernel PCA (KPCA) modeling with an unsupervised one-class Support vector machine (SVM) (OCSVM) scheme to distinguish normal from abnormal measurements
EXPLORATIVE ANALYSIS AND VISUALIZATION OF A HISTORICAL MULTIVARIATE influent conditions (ICs) DATASET To verify the hypothesis, one dataset from full-scale WWTP was engaged for training and testing
radial basis functions (RBF) KPCAs were poor with fewer principal components (PCs) but functioned better with more, which would involve intensive computation or potential overfitting and not suitable for online process monitoring or knowledge sharing among different WWTP ICs
Summary
Control, and automation in WWTPs are producing quantities of multivariate time series data, which are often unexploited This ‘‘data-rich, information-poor’’ dilemma is attributed to the lack of methodology to select the right algorithm for a given case, the lack of standard prototypical data processing procedures, and the lack of trained environmental data scientist or data science expertise among environmental scientists [4]. This paper proposed an effective monitoring strategy merging the desirable characteristics of KPCA modeling with an unsupervised one-class SVM (OCSVM) scheme to distinguish normal from abnormal measurements. In this regards, KPCA was used to account for nonlinearities in the multivariate ICs data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.