Abstract

Given an array A [1, n ] of elements with a total order, we consider the problem of building a data structure that solves two queries: ( a ) selection queries receive a range [ i , j ] and an integer k and return the position of the k th largest element in A [ i , j ]; ( b ) top- k queries receive [ i , j ] and k and return the positions of the k largest elements in A [ i , j ]. These problems can be solved in optimal time, O (1+lg k /lg lg n ) and O ( k ), respectively, using linear-space data structures. We provide the first study of the encoding data structures for the above problems, where A cannot be accessed at query time. Several applications are interested in the relative order of the entries of A , and their positions, rather their actual values, and thus we do not need to keep A at query time. In those cases, encodings save storage space: we first show that any encoding answering such queries requires n lg k - O ( n + k lg k ) bits of space; then, we design encodings using O ( n lg k ) bits, that is, asymptotically optimal up to constant factors, while preserving optimal query time.

Highlights

  • A frequent problem in data and log mining applications is to find highest or lowest values in a range of a stream: the coldest days in a time period, peaks in the stock market, most popular terms in Twitter, most frequent queries in Google, and so on

  • Chan and Wilkinson [9] manage to store them in O(n(lg κ + lg lg n +/κ)) bits, which gives O(n lg n) bits when added over a set of suitable κ values

  • We have studied for the first time the problem of encoding data structures for array range queries sel(·) and top(·), which return the kth largest element or all the top-k elements, respectively, of any interval A[i, j]

Read more

Summary

Introduction

A frequent problem in data and log mining applications is to find highest or lowest values in a range of a stream: the coldest days in a time period, peaks in the stock market, most popular terms in Twitter, most frequent queries in Google, and so on. Jørgensen and Larsen [23] introduced the κ-capped range selection problem, where a parameter κ is provided at preprocessing time, and the data structure only supports selection for ranks 1 ≤ k ≤ κ (as explained, interesting encodings can only solve this κ-capped version of the problem) They showed that even the one-sided κ-capped range selection problem requires query time Ω(lg k/ lg lg n) for structures using O(n polylog n) words; the result of Chan and Wilkinson is the best possible for that space. This is mostly interesting for low values of , generalizing the existing structures that solve the case = 1 [13].

Bit-vectors
Sequences
Parentheses and trees
Predecessor queries
Lower bounds
General approach
Shallow cuttings
Optimal-time select queries
Encodings for optimal-time select queries
Shallow cuttings in succinct space
Computing next-larger and prev-larger queries
Constant-time access to Ev
Marking nodes
Handling marked nodes
Handling unmarked nodes
Retrieving the positions of original points
Retrieving the positions of inherited points
Predecessor queries on Ev
Handling large κ values
Handling small κ values
Wrapping up
One-sided queries
Solving top-κ queries
10 Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call