Abstract
Given an array A [1, n ] of elements with a total order, we consider the problem of building a data structure that solves two queries: ( a ) selection queries receive a range [ i , j ] and an integer k and return the position of the k th largest element in A [ i , j ]; ( b ) top- k queries receive [ i , j ] and k and return the positions of the k largest elements in A [ i , j ]. These problems can be solved in optimal time, O (1+lg k /lg lg n ) and O ( k ), respectively, using linear-space data structures. We provide the first study of the encoding data structures for the above problems, where A cannot be accessed at query time. Several applications are interested in the relative order of the entries of A , and their positions, rather their actual values, and thus we do not need to keep A at query time. In those cases, encodings save storage space: we first show that any encoding answering such queries requires n lg k - O ( n + k lg k ) bits of space; then, we design encodings using O ( n lg k ) bits, that is, asymptotically optimal up to constant factors, while preserving optimal query time.
Highlights
A frequent problem in data and log mining applications is to find highest or lowest values in a range of a stream: the coldest days in a time period, peaks in the stock market, most popular terms in Twitter, most frequent queries in Google, and so on
Chan and Wilkinson [9] manage to store them in O(n(lg κ + lg lg n +/κ)) bits, which gives O(n lg n) bits when added over a set of suitable κ values
We have studied for the first time the problem of encoding data structures for array range queries sel(·) and top(·), which return the kth largest element or all the top-k elements, respectively, of any interval A[i, j]
Summary
A frequent problem in data and log mining applications is to find highest or lowest values in a range of a stream: the coldest days in a time period, peaks in the stock market, most popular terms in Twitter, most frequent queries in Google, and so on. Jørgensen and Larsen [23] introduced the κ-capped range selection problem, where a parameter κ is provided at preprocessing time, and the data structure only supports selection for ranks 1 ≤ k ≤ κ (as explained, interesting encodings can only solve this κ-capped version of the problem) They showed that even the one-sided κ-capped range selection problem requires query time Ω(lg k/ lg lg n) for structures using O(n polylog n) words; the result of Chan and Wilkinson is the best possible for that space. This is mostly interesting for low values of , generalizing the existing structures that solve the case = 1 [13].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have