Abstract

Anomaly detection (AD) in high-volume environmental data requires one to tackle a series of challenges associated with the typical low frequency of anomalous events, the broad-range of possible anomaly types, and local nonstationary environmental conditions, suggesting the need for flexible statistical methods that are able to cope with unbalanced high-volume data problems. Here, we aimed to detect anomalies caused by technical errors in water-quality (turbidity and conductivity) data collected by automated in situ sensors deployed in contrasting riverine and estuarine environments. We first applied a range of artificial neural networks that differed in both learning method and hyperparameter values, then calibrated models using a Bayesian multiobjective optimization procedure, and selected and evaluated the "best" model for each water-quality variable, environment, and anomaly type. We found that semi-supervised classification was better able to detect sudden spikes, sudden shifts, and small sudden spikes, whereas supervised classification had higher accuracy for predicting long-term anomalies associated with drifts and periods of otherwise unexplained high variability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.