Abstract

A recent model for property testing of probability distributions [CFGM13, CRS15] enables tremendous savings in the sample complexity of testing algorithms, by allowing them to condition the sampling on subsets of the domain. In particular, Canonne, Ron, and Servedio [CRS15] showed that, in this setting, testing identity of an unknown distribution D (i.e., whether D = D∗ for an explicitly known D∗) can be done with a constant number of samples, independent of the support size n – in contrast to the required √ n in the standard sampling model. However, it was unclear whether the same held for the case of testing equivalence, where both distributions are unknown. Indeed, while Canonne, Ron, and Servedio [CRS15] established a polylog(n)-query upper bound for equivalence testing, very recently brought down to O(log logn) by Falahatgar et al. [FJO+15], whether a dependence on the domain size n is necessary was still open, and explicitly posed by Fischer at the Bertinoro Workshop on Sublinear Algorithms [Sublinear.info, Problem 66]. In this work, we answer the question in the positive, showing that any testing algorithm for equivalence must make Ω (√ log logn ) queries in the conditional sampling model. Interestingly, this demonstrates an intrinsic qualitative gap between identity and equivalence testing, absent in the standard sampling model (where both problems have sampling complexity nΘ(1)). Turning to another question, we investigate the complexity of support size estimation. We provide a doubly-logarithmic upper bound for the adaptive version of this problem, generalizing work of Ron and Tsur [RT14] to our weaker model. We also establish a logarithmic lower bound for the non-adaptive version of this problem. This latter result carries on to the related problem of non-adaptive uniformity testing, an exponential improvement over previous results that resolves an open question of Chakraborty, Fischer, Goldhirsh, and Matsliah [CFGM13]. ∗EECS, MIT. Email: jayadev@csail.mit.edu. Research supported by grant from MITEI-Shell program. †Columbia University. Email: ccanonne@cs.columbia.edu. Research supported by NSF CCF-1115703 and NSF CCF-1319788. ‡EECS, MIT. Email: g@csail.mit.edu.

Highlights

  • Background and previous workWe focus in this paper on proving lower bounds for testing two extremely natural properties of distributions, namely equivalence testing (“are these two distributions identical?”) and support-size estimation (“how many different outcomes can be observed?”)

  • The hope is that allowing a richer set of queries to the unknown underlying distributions might significantly reduce the number of samples the algorithms need, thereby sidestepping the strong lower bounds that hold in the standard sampling model

  • The uniformity testing problem exemplifies the savings granted by conditional sampling—as Canonne, Ron, and Servedio [14] showed, in this setting only O 1/ε2 adaptive queries3 are sufficient

Read more

Summary

Introduction

Understanding properties and characteristics of an unknown probability distribution is a fundamental problem in statistics, and one that has been thoroughly studied. It is only since the work of Goldreich and Ron [27] and Batu et al [9] that the problem has been considered through the lens of theoretical computer science, more in the setting of property testing. Among these is the conditional oracle model [16, 14] which will be the focus of our work In this setting, the testing algorithm is given the ability to sample from conditional distributions: that is, to specify a subset S of the domain and obtain samples from DS, the distribution induced by D on S The hope is that allowing a richer set of queries to the unknown underlying distributions might significantly reduce the number of samples the algorithms need, thereby sidestepping the strong lower bounds that hold in the standard sampling model

Motivation for the conditional model
Background and previous work
Our results
Relation to the Ron-Tsur model
Techniques and proof ideas
Notation and sampling models
Adaptive core testers
On the use of Yao’s Principle in our lower bounds
Chernoff bounds for binomials and hypergeometrics
A lower bound for equivalence testing
Construction
B2 B3 B4
Analysis
Banning “bad queries”
Key lemma
Lower bounds for non-adaptive algorithms
An upper bound for support-size estimation
A non-adaptive upper bound

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.