Event queries on correlated probabilistic streams

Christopher Ré,Magdalena Balazinksa,Dan Suciu,Julie Letchner

doi:10.1145/1376616.1376688

Abstract

A major problem in detecting events in streams of data is that the data can be imprecise (e.g. RFID data). However, current state-ofthe-art event detection systems such as Cayuga [14], SASE [46] or SnoopIB[1], assume the data is precise. Noise in the data can be captured using techniques such as hidden Markov models. Inference on these models creates streams of probabilistic events which cannot be directly queried by existing systems. To address this challenge we propose Lahar1, an event processing system for probabilistic event streams. By exploiting the probabilistic nature of the data, Lahar yields a much higher recall and precision than deterministic techniques operating over only the most probable tuples. By using a novel static analysis and novel algorithms, Lahar processes data orders of magnitude more efficiently than a naïve approach based on sampling. In this paper, we present Lahar's static analysis and core algorithms. We demonstrate the quality and performance of our approach through experiments with our prototype implementation and comparisons with alternate methods.

Full Text