A Chasm Between Identity and Equivalence Testing with Conditional Queries

Jayadev Acharya,Gautam Kamath,Clement L Canonne

doi:10.4086/toc.2018.v014a019

Jayadev Acharya, Gautam Kamath + Show 1 more

Open Access

https://doi.org/10.4086/toc.2018.v014a019

Copy DOI

Journal: Theory of Computing	Publication Date: Jan 1, 2018
Citations: 2	License type: cc-by

Affiliation: Massachusetts Institute of Technology

Abstract

A recent model for property testing of probability distributions [CFGM13, CRS15] enables tremendous savings in the sample complexity of testing algorithms, by allowing them to condition the sampling on subsets of the domain. In particular, Canonne, Ron, and Servedio [CRS15] showed that, in this setting, testing identity of an unknown distribution D (i.e., whether D = D∗ for an explicitly known D∗) can be done with a constant number of samples, independent of the support size n – in contrast to the required √ n in the standard sampling model. However, it was unclear whether the same held for the case of testing equivalence, where both distributions are unknown. Indeed, while Canonne, Ron, and Servedio [CRS15] established a polylog(n)-query upper bound for equivalence testing, very recently brought down to O(log logn) by Falahatgar et al. [FJO+15], whether a dependence on the domain size n is necessary was still open, and explicitly posed by Fischer at the Bertinoro Workshop on Sublinear Algorithms [Sublinear.info, Problem 66]. In this work, we answer the question in the positive, showing that any testing algorithm for equivalence must make Ω (√ log logn ) queries in the conditional sampling model. Interestingly, this demonstrates an intrinsic qualitative gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity nΘ(1)). Turning to another question, we investigate the complexity of support size estimation. We provide a doubly-logarithmic upper bound for the adaptive version of this problem, generalizing work of Ron and Tsur [RT14] to our weaker model. We also establish a logarithmic lower bound for the non-adaptive version of this problem. This latter result carries on to the related problem of non-adaptive uniformity testing, an exponential improvement over previous results that resolves an open question of Chakraborty, Fischer, Goldhirsh, and Matsliah [CFGM13]. ∗EECS, MIT. Email: jayadev@csail.mit.edu. Research supported by grant from MITEI-Shell program. †Columbia University. Email: ccanonne@cs.columbia.edu. Research supported by NSF CCF-1115703 and NSF CCF-1319788. ‡EECS, MIT. Email: g@csail.mit.edu.

Highlights

Background and previous workWe focus in this paper on proving lower bounds for testing two extremely natural properties of distributions, namely equivalence testing (“are these two distributions identical?”) and support-size estimation (“how many different outcomes can be observed?”)
The hope is that allowing a richer set of queries to the unknown underlying distributions might significantly reduce the number of samples the algorithms need, thereby sidestepping the strong lower bounds that hold in the standard sampling model
The uniformity testing problem exemplifies the savings granted by conditional sampling—as Canonne, Ron, and Servedio [14] showed, in this setting only O 1/ε2 adaptive queries3 are sufficient

Summary

Introduction

Understanding properties and characteristics of an unknown probability distribution is a fundamental problem in statistics, and one that has been thoroughly studied. It is only since the work of Goldreich and Ron [27] and Batu et al [9] that the problem has been considered through the lens of theoretical computer science, more in the setting of property testing. Among these is the conditional oracle model [16, 14] which will be the focus of our work In this setting, the testing algorithm is given the ability to sample from conditional distributions: that is, to specify a subset S of the domain and obtain samples from DS, the distribution induced by D on S The hope is that allowing a richer set of queries to the unknown underlying distributions might significantly reduce the number of samples the algorithms need, thereby sidestepping the strong lower bounds that hold in the standard sampling model

Motivation for the conditional model

Background and previous work

Our results

Relation to the Ron-Tsur model

Techniques and proof ideas

Notation and sampling models

Adaptive core testers

On the use of Yao’s Principle in our lower bounds

Chernoff bounds for binomials and hypergeometrics

A lower bound for equivalence testing

Construction

B2 B3 B4

Analysis

Banning “bad queries”

Key lemma

Lower bounds for non-adaptive algorithms

An upper bound for support-size estimation

A non-adaptive upper bound

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Chasm Between Identity and Equivalence Testing with Conditional Queries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Theory of Computing

Lead the way for us

Similar Papers

A Chasm Between Identity and Equivalence Testing with Conditional Queries
...
-
, et. al. ...
28 Jul 2015
28 Jul 2015

Faster sublinear algorithms using conditional sampling
...
-
, et. al. ...
16 Jan 2017
16 Jan 2017

Faster Sublinear Algorithms using Conditional Sampling
Themistoklis Gouleakis ... Manolis Zampetakis
-
Themistoklis Gouleakis, et. al.Themistoklis Gouleakis ... Manolis Zampetakis
01 Jan 2017
01 Jan 2017

Testing equivalence between distributions using conditional samples
...
-
, et. al. ...
05 Jan 2014
05 Jan 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Chasm Between Identity and Equivalence Testing with Conditional Queries

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Theory of Computing