Abstract

The automated detection of sequential anomalies in time series is an essential task for many applications, such as the monitoring of technical systems, fraud detection in high-frequency trading, or the early detection of disease symptoms. All these applications require the detection to find all sequential anomalies possibly fast on potentially very large time series. In other words, the detection needs to be effective, efficient and scalable w.r.t. the input size. Series2Graph is an effective solution based on graph embeddings that are robust against re-occurring anomalies and can discover sequential anomalies of arbitrary length and works without training data. Yet, Series2Graph is no t scalable due to its single-threaded approach; it cannot, in particular, process arbitrarily large sequences due to the memory constraints of a single machine. In this paper, we propose our distributed anomaly detection system, short DADS, which is an efficient and scalable adaptation of Series2Graph. Based on the actor programming model, DADS distributes the input time sequence, intermediate state and the computation to all processors of a cluster in a way that minimizes communication costs and synchronization barriers. Our evaluation shows that DADS is orders of magnitude faster than S2G, scales almost linearly with the number of processors in the cluster and can process much larger input sequences due to its scale-out property.

Highlights

  • Time series analysis is a multi-disciplinary field with use cases in astrophysics [12], neurosciences [37], environmental monitoring [35], Internet of things [14], finance [46], asset tracking [29], aviation engineering [7] and many further disciplines

  • An effective sequential anomaly detection algorithm automatically marks all anomalous, i.e., infrequent subsequences in a time series; thereby the algorithm is robust against reoccurring anomalies, can discover sequential anomalies of arbitrary length and works without training data

  • We introduce our Distributed Anomaly Detection System (DADS)

Read more

Summary

Sequential anomaly detection

Time series analysis is a multi-disciplinary field with use cases in astrophysics [12], neurosciences [37], environmental monitoring [35], Internet of things [14], finance [46], asset tracking [29], aviation engineering [7] and many further disciplines. Typical time series of terabyte to petabyte size, such as those that often appear in domains like bioinformatics [20] or astrophysics [51], are hardly computable with the current version of S2G For this reason, we introduce our Distributed Anomaly Detection System (DADS). Point anomaly detection is a important tool for the analysis of sensor networks For this special purpose, several time series mining algorithms have been introduced [35]. To identify distance- or density-based outliers, algorithms, such as [44], propose to estimate the data distribution of the underlying time series across the network of sensors and, calculate local outliers to these estimates. An extensive comparison of different algorithms can be found in [45]

Related work
Point anomaly detection
Sequence anomaly detection
Distributed anomaly detection
Distributed principal component analysis
Foundations
Anomaly detection
Distributed computing
Series2Graph algorithm overview
Subsequence embedding
Node extraction
Edge extraction
Distributed anomaly detection system
DADS protocols
DADS architecture
Processor registration
Sequence slice distribution
Principal component analysis
Dimension reduction
Distributed node extraction
Edge creation
Graph merging
Distributed scoring
10 Evaluation
10.1 Anomaly detection quality
10.3 Scaling in cluster size
10.2 Scaling in memory capacity
10.4 Scaling in data size
11 Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.