Abstract

We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the best-case and worst-case partition models. We aim to understand whether the hardness of a communication problem holds for almost every allocation of the input, as opposed to holding for perhaps just a few atypical partitions. A key application is to the heavily studied data stream model. There is a strong connection between our communication lower bounds and lower bounds in the data stream model that are “robust” to the ordering of the data. That is, we prove lower bounds for when the order of the items in the stream is chosen not adversarially but rather uniformly (or near-uniformly) from the set of all permutations. This random-order data stream model has attracted recent interest, since lower bounds here give stronger evidence for the inherent hardness of streaming problems. Our results include the first random-partition communication lower bounds for problems including multi-party set disjointness and gap-Hamming-distance. Both are tight. We also extend and improve previous results for a form of pointer jumping that is relevant to the problem of selection (in particular, median finding). Collectively, these results yield lower bounds for a variety of problems in the random-order data stream model, including estimating the number of distinct elements, approximating frequency moments, and quantile estimation. A short version of this article is available in the Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC'08), ACM, pp. 641-650. Compared to the conference presentation, this version considerably expands the detail of the discussion and in the proofs, and substantially changes some of the proof techniques.

Highlights

  • Since its introduction in 1979 by Yao, communication complexity [44, 32] has proven to be a powerful framework for proving lower bounds in a variety of settings, including the cell-probe and data stream models, circuit and decision tree complexity and VLSI design

  • As a consequence of the robust communication lower bounds we prove we obtain a considerably simpler and improved multi-pass streaming lower bound for median finding

  • Stage 1: We prove a multi-round lower bound on the communication complexity of an appropriate “source problem,” which is either M-TPJk,t, as in Theorem 4.4 or TPJk,t, as in Theorem 4.9

Read more

Summary

Introduction

Since its introduction in 1979 by Yao, communication complexity [44, 32] has proven to be a powerful framework for proving lower bounds in a variety of settings, including the cell-probe and data stream models, circuit and decision tree complexity and VLSI design. Many explicit functions can be shown to require a large amount of communication to evaluate when the input is partitioned between the players in this manner These imply lower bounds for various models of computation, via arguments that such partitions necessarily arise in the course of the computation. It is important to understand the complexity of problems not just in worst-case and in “average-case” settings To this end we prove lower bounds in the setting that the ordering of tokens in the data stream is chosen not adversarially but randomly, from the set of all permutations. The above communication lower bounds lead to lower bounds for a number of data stream problems in the random-order model. With two passes, we obtain a space lower bound of Ω(m1/10) as compared with their Ω(m3/80)

Notation and preliminaries
The communication model
Technique preliminaries
Preliminary lemmas
Multi-party set disjointness
Pointer jumping and selection
Weight-based TPJ and a reduction to selection
A robust multi-player lower bound
A robust two-player lower bound
Hamming distance
Robust lower bounds for data stream computation
Frequency moments
Distinct elements and entropy
Selection
Graph streaming
Information divergences
A Distance between binomial distributions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call