Learning mixtures of structured distributions over discrete domains

Siu-On Chan,Xiaorui Sun,Ilias Diakonikolas,Rocco A Servedio

doi:10.1137/1.9781611973105.100

Abstract

Let be a class of probability distributions over the discrete domain [n] = {1, …, n}. We show that if satisfies a rather general condition – essentially, that each distribution in can be well-approximated by a variable-width histogram with few bins – then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of k unknown distributions from . We analyze several natural types of distributions over [n], including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being well-approximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems as described below. More precisely, Log-concave distributions: We learn any mixture of k log-concave distributions over [n] using k · Õ(1/ε4) samples (independent of n) and running in time Õ(k log(n)/ε4) bit-operations (note that reading a single sample from [n] takes Θ(log n) bit operations). For the special case k = 1 we give an efficient algorithm using Õ(1/ε3) samples; this generalizes the main result of [DDS12b] from the class of Poisson Binomial distributions to the much broader class of all log-concave distributions. Our upper bounds are not far from optimal since any algorithm for this learning problem requires Ω(k/ε5/2) samples. Monotone hazard rate (MHR) distributions: We learn any mixture of k MHR distributions over [n] using O(k log(n/ε)/ε4) samples and running in time Õ(k log (n)/ε4) bit-operations. Any algorithm for this learning problem must use Ω(k log(n)/ε3) samples. Unimodal distributions: We give an algorithm that learns any mixture of k unimodal distributions over [n] using O(k log(n)/ε4) samples and running in time Õ(k log2(n)/ε4) bit-operations. Any algorithm for this problem must use Ω(k log(n)/ε3) samples.

Full Text