Abstract
We consider the problem of learning to detect anomalous time series from an unlabeled data set, possibly contaminated with anomalies in the training data. This scenario is important for applications in medicine, economics, or industrial quality control, in which labeling is difficult and requires expensive expert knowledge, and anomalous data is difficult to obtain. This article presents a novel method for unsupervised anomaly detection based on the shapelet transformation for time series. Our approach learns representative features that describe the shape of time series stemming from the normal class, and simultaneously learns to accurately detect anomalous time series. An objective function is proposed that encourages learning of a feature representation in which the normal time series lie within a compact hypersphere of the feature space, whereas anomalous observations will lie outside of a decision boundary. This objective is optimized by a block-coordinate descent procedure. Our method can efficiently detect anomalous time series in unseen test data without retraining the model by reusing the learned feature representation. We demonstrate on multiple benchmark data sets that our approach reliably detects anomalous time series, and is more robust than competing methods when the training instances contain anomalous time series.
Highlights
Detecting anomalous instances in temporal sequence data is an important problem in domains such as economics (Hyndman et al 2015b), medicine (Chuah and Fu 2007), astronomy (Rebbapragada et al 2009), and computer safety and intrusion detection (Zhong et al 2007)
Whereas several anomaly detection methods learn models of normal time series under the assumption that all training data is normal (Mahoney and Chan 2005; Salvador and Chan 2005; Rebbapragada et al 2009), we present a novel method based on the Support Vector Data Description (SVDD) (Tax and Duin 2004) that learns to detect anomalous time series even if the training set is contaminated with occasional anomalies
We decide if a data problem is suitable for evaluating our anomaly detection method ADSL based on several decision criteria, which is displayed in Fig. 5: first, time series longer than 700 measurements per observation are excluded because of processing capacities
Summary
Detecting anomalous instances in temporal sequence data is an important problem in domains such as economics (Hyndman et al 2015b), medicine (Chuah and Fu 2007), astronomy (Rebbapragada et al 2009), and computer safety and intrusion detection (Zhong et al 2007). Consider for example the application in medicine for monitoring electrocardiogram data by a wearable sensor. A person’s healthy heartbeat shows more or less the same electrocardiographic measurement pattern. For each heartbeat, we have a temporal sequence of measurements, resulting in a data set of heartbeat measurements of a single person. The patient develops some first signs of arrhythmia which cause deviations from the healthy heartbeats, the anomalies. The anomaly detection model in the sensor detects these anomalous measurements and can raise an alarm or inform the patient’s doctor
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.