Asymptotically Optimal Encodings of Range Data Structures for Selection and Top- k Queries

Roberto Grossi,Gonzalo Navarro,S Rao Satti,John Iacono,Rajeev Raman

doi:10.1145/3012939

Abstract

Given an array A [1, n ] of elements with a total order, we consider the problem of building a data structure that solves two queries: ( a ) selection queries receive a range [ i , j ] and an integer k and return the position of the k th largest element in A [ i , j ]; ( b ) top- k queries receive [ i , j ] and k and return the positions of the k largest elements in A [ i , j ]. These problems can be solved in optimal time, O (1+lg k /lg lg n ) and O ( k ), respectively, using linear-space data structures. We provide the first study of the encoding data structures for the above problems, where A cannot be accessed at query time. Several applications are interested in the relative order of the entries of A , and their positions, rather their actual values, and thus we do not need to keep A at query time. In those cases, encodings save storage space: we first show that any encoding answering such queries requires n lg k - O ( n + k lg k ) bits of space; then, we design encodings using O ( n lg k ) bits, that is, asymptotically optimal up to constant factors, while preserving optimal query time.

Highlights

A frequent problem in data and log mining applications is to find highest or lowest values in a range of a stream: the coldest days in a time period, peaks in the stock market, most popular terms in Twitter, most frequent queries in Google, and so on
Chan and Wilkinson [9] manage to store them in O(n(lg κ + lg lg n +/κ)) bits, which gives O(n lg n) bits when added over a set of suitable κ values
We have studied for the first time the problem of encoding data structures for array range queries sel(·) and top(·), which return the kth largest element or all the top-k elements, respectively, of any interval A[i, j]

Summary

Introduction

A frequent problem in data and log mining applications is to find highest or lowest values in a range of a stream: the coldest days in a time period, peaks in the stock market, most popular terms in Twitter, most frequent queries in Google, and so on. Jørgensen and Larsen [23] introduced the κ-capped range selection problem, where a parameter κ is provided at preprocessing time, and the data structure only supports selection for ranks 1 ≤ k ≤ κ (as explained, interesting encodings can only solve this κ-capped version of the problem) They showed that even the one-sided κ-capped range selection problem requires query time Ω(lg k/ lg lg n) for structures using O(n polylog n) words; the result of Chan and Wilkinson is the best possible for that space. This is mostly interesting for low values of , generalizing the existing structures that solve the case = 1 [13].

Bit-vectors

Sequences

Parentheses and trees

Predecessor queries

Lower bounds

General approach

Shallow cuttings

Optimal-time select queries

Encodings for optimal-time select queries

Shallow cuttings in succinct space

Computing next-larger and prev-larger queries

Constant-time access to Ev

Marking nodes

Handling marked nodes

Handling unmarked nodes

Retrieving the positions of original points

Retrieving the positions of inherited points

Predecessor queries on Ev

Handling large κ values

Handling small κ values

Wrapping up

One-sided queries

Solving top-κ queries

10 Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Algorithms	Publication Date: Mar 6, 2017
Citations: 9	License type: cc-by

R Discovery Prime

R Discovery Prime

Asymptotically Optimal Encodings of Range Data Structures for Selection and Top- k Queries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ACM Transactions on Algorithms

Lead the way for us

Similar Papers

Path Queries in Weighted Trees
Meng He ... Gelin Zhou
-
Meng He, et. al.Meng He ... Gelin Zhou
01 Jan 2010
01 Jan 2010

Two-dimensional range successor in optimal time and almost linear space
Gelin Zhou
Information Processing Letters | VOL. 116
Gelin ZhouGelin Zhou
21 Oct 2015
Information Processing Letters | VOL. 116

An (Almost) Optimal Solution for Orthogonal Point Enclosure Query in ℝ3
Saladi Rahul
Mathematics of Operations Research | VOL. 45
Saladi RahulSaladi Rahul
13 Oct 2019
Mathematics of Operations Research | VOL. 45

A Lower Bound for Dynamic Fractional Cascading
Peyman Afshani
-
Peyman AfshaniPeyman Afshani
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Asymptotically Optimal Encodings of Range Data Structures for Selection and Top- k Queries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: ACM Transactions on Algorithms