Efficient Approximate OLAP Querying Over Time Series

Kasun S Perera,Christian Thomsen,Torben Bach Pedersen,Martin Hahmann,Wolfgang Lehner

doi:10.1145/2938503.2938526

Abstract

The ongoing trend for data gathering not only produces larger volumes of data, but also increases the variety of recorded data types. Out of these, especially time series, e.g. various sensor readings, have attracted attention in the domains of business intelligence and decision making. As OLAP queries play a major role in these domains, it is desirable to also execute them on time series data. While this is not a problem on the conceptual level, it can become a bottleneck with regards to query run-time. In general, processing OLAP queries gets more computationally intensive as the volume of data grows. This is a particular problem when querying time series data, which generally contains multiple measures recorded at fine time granularities. Usually, this issue is addressed either by scaling up hardware or by employing workload based query optimization techniques. However, these solutions are either costly or require continuous maintenance. In this paper we propose an approach for approximate OLAP querying of time series that offers constant latency and is maintenance-free. To achieve this, we identify similarities between aggregation cuboids and propose algorithms that eliminate the redundancy these similarities present. In doing so, we can achieve compression rates of up to 80% while maintaining low average errors in the query results.

Full Text