Testing Probability Distributions using Conditional Samples

Clément L Canonne,Rocco A Servedio,Dana Ron

doi:10.1137/130945508

Abstract

We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle. This is an oracle that takes as input a subset $S \subseteq [N]$ of the domain $[N]$ of the unknown probability distribution ${\cal D}$ and returns a draw from the conditional probability distribution ${\cal D}$ restricted to $S$. This new model allows considerable flexibility in the design of distribution testing algorithms; in particular, testing algorithms in this model can be adaptive. We study a wide range of natural distribution testing problems in this new framework and some of its variants, giving both upper and lower bounds on query complexity. These problems include testing whether ${\cal D}$ is the uniform distribution ${\cal U}$; testing whether ${\cal D} = {\cal D}^\ast$ for an explicitly provided ${\cal D}^\ast$; testing whether two unknown distributions ${\cal D}_1$ and ${\cal D}_2$ are equivalent; and estimating the variation distance between ${\cal D}$ and the uniform distribution. At a high level, our main finding is that the new conditional sampling framework we consider is a powerful one: while all the problems mentioned above have $\Omega(\sqrt{N})$ sample complexity in the standard model (and in some cases the complexity must be almost linear in $N$), we give ${\rm poly}(\log N, 1/\epsilon)$-query algorithms (and in some cases ${\rm poly}(1/\epsilon)$-query algorithms independent of $N$) for all these problems in our conditional sampling setting.

Full Text