Correlated Anomaly Detection from Large Streaming Data

Zheng Chen,Xiaohua Hu,Xinli Yu,Wei Quan,Yuan Ling,Erjia Yan,Bo Song

doi:10.1109/bigdata.2018.8622004

Abstract

Correlated anomaly detection (CAD) from streaming data is a type of group anomaly detection and an essential task in useful real-time data mining applications like botnet detection, financial event detection, industrial process monitor, etc. The primary approach for this type of detection in previous researches is based on principal score (PS) of divided batches or sliding windows by computing top eigenvalues of the correlation matrix, e.g. the Lanczos algorithm. However, this paper brings up the phenomenon of principal score degeneration for large data set, and then mathematically and practically prove current PS-based methods are likely to fail for CAD on large-scale streaming data even if the number of correlated anomalies grows with the data size at a reasonable rate; in reality, anomalies tend to be the minority of the data, and this issue can be more serious. We propose a framework with two novel randomized algorithms rPS and gPS for better detection of correlated anomalies from large streaming data of various correlation strength. The experiment shows high and balanced recall and estimated accuracy of our framework for anomaly detection from a large server log data set and a U.S. stock daily price data set in comparison to direct principal score evaluation and some other recent group anomaly detection algorithms. Moreover, our techniques significantly improve the computation efficiency and scalability for principal score calculation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Correlated Anomaly Detection from Large Streaming Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Scalable fuzzy clustering algorithms
Lawrence O Hall
-
Lawrence O HallLawrence O Hall
01 May 2008
01 May 2008

Balanced Parallel Frequent Pattern Mining over Massive Data Stream
Xi Fu ... Lei Shi
-
Xi Fu, et. al.Xi Fu ... Lei Shi
01 Apr 2017
01 Apr 2017

Scalable teacher forcing network for semi-supervised large scale data streams
Mahardhika Pratama ... Dwi A.P Rahayu
Information Sciences | VOL. 576
Mahardhika Pratama, et. al.Mahardhika Pratama ... Dwi A.P Rahayu
26 Jun 2021
Information Sciences | VOL. 576

Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry
Lukas Reiter ... Ruedi Aebersold
Molecular & Cellular Proteomics | VOL. 8
Lukas Reiter, et. al.Lukas Reiter ... Ruedi Aebersold
01 Nov 2009
Molecular & Cellular Proteomics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Correlated Anomaly Detection from Large Streaming Data

Abstract

Talk to us

Similar Papers