OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates

Wieger R Punter,Odysseas Papapetrou,Minos Garofalakis

doi:10.14778/3632093.3632098

Abstract

A key need in different disciplines is to perform analytics over fast-paced data streams, similar in nature to the traditional OLAP analytics in relational databases - i.e., with filters and aggregates. Storing unbounded streams, however, is not a realistic, or desired approach due to the high storage requirements, and the delays introduced when storing massive data. Accordingly, many synopses/sketches have been proposed that can summarize the stream in small memory (usually sufficiently small to be stored in RAM), such that aggregate queries can be efficiently approximated, without storing the full stream. However, past synopses predominantly focus on summarizing single-attribute streams, and cannot handle filters and constraints on arbitrary subsets of multiple attributes efficiently. In this work, we propose OmniSketch, the first sketch that scales to fast-paced and complex data streams (with many attributes), and supports count aggregates with filters on multiple attributes, dynamically chosen at query time. The sketch offers probabilistic guarantees, a favorable space-accuracy tradeoff, and a worst-case logarithmic complexity for updating and for query execution. We demonstrate experimentally with both real and synthetic data that the sketch outperforms the state-of-the-art, and that it can approximate complex ad-hoc queries within the configured accuracy guarantees, with small memory requirements.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Similar Papers

PERFORMANCE STUDY OF THE DTU MODEL FOR RELATIONAL DATABASES ON THE AZURE PLATFORM
Serhii Minukhin
Innovative Technologies and Scientific Solutions for Industries | VOL. -
Serhii MinukhinSerhii Minukhin
26 Apr 2022
Innovative Technologies and Scientific Solutions for Industries | VOL. -

GPS: A General Framework for Parallel Queries over Data Streams in Cloud
Xiaoyong Li ... Yu Zhao
-
Xiaoyong Li, et. al.Xiaoyong Li ... Yu Zhao
01 Nov 2013
01 Nov 2013

An Execution Cost Model for Aggregate Queries over Knowledge Graphs
Tao Fu ... Yuxiang Wang
-
Tao Fu, et. al.Tao Fu ... Yuxiang Wang
01 Mar 2022
01 Mar 2022

Efficient Complex Aggregate Queries with Accuracy Guarantee Based on Execution Cost Model over Knowledge Graphs
Shuzhan Ye ... Yuxiang Wang
Mathematics | VOL. 11
Shuzhan Ye, et. al.Shuzhan Ye ... Yuxiang Wang
14 Sep 2023
Mathematics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

OmniSketch: Efficient Multi-Dimensional High-Velocity Stream Analytics with Arbitrary Predicates

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment