Abstract

Maintaining multistream and time-delayed statistics in a continuous online fashion is a significant challenge in data management. This chapter solves this problem in a scalable way that gives a guaranteed response time with high accuracy. The Discrete Fourier Transform (DFT) technique reduces the enormous raw data streams into a manageable synoptic data structure and gives good I/O performance. For any pair of streams, the pair-wise statistic is computed in an incremental fashion and requires constant time per update using a DFT approximation. A sliding/basic window framework is introduced to facilitate the efficient management of streaming data digests. One reduces the correlation coefficient similarity measure to a Euclidean measure and makes use of a grid structure to detect correlations among thousands of high-speed data streams in real time. Experiments conducted using synthetic and real data show that StatStream detects correlations efficiently and precisely.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call