Durable top-k queries on temporal data

Junyang Gao,Pankaj K Agarwal,Jun Yang

doi:10.14778/3275366.3275371

Abstract

Many datasets have a temporal dimension and contain a wealth of historical information. When using such data to make decisions, we often want to examine not only the current snapshot of the data but also its history. For example, given a result object of a snapshot query, we can ask for its "durability," or intuitively, how long (or how often) it was valid in the past. This paper considers durable top-k queries, which look for objects whose values were among the top k for at least some fraction of the times during a given interval---e.g., stocks that were among the top 20 most heavily traded for at least 80% of the trading days during the last quarter of 2017. We present a comprehensive suite of techniques for solving this problem, ranging from exact algorithms where k is fixed in advance, to approximate methods that work for any k and are able to exploit workload and data characteristics to improve accuracy while capping index cost. We show that our methods vastly outperform baseline and previous methods using both real and synthetic datasets.

Full Text