Abstract

We analyze the problem of discrete distribution estimation under l1 loss. We provide non-asymptotic upper and lower bounds on the maximum risk of the empirical distribution (the maximum likelihood estimator), and the minimax risk in regimes where the alphabet size S may grow with the number of observations n. We show that among distributions with bounded entropy H, the asymptotic maximum risk for the empirical distribution is 2H / ln n, while the asymptotic minimax risk is H / ln n. Moreover, a hard-thresholding estimator, whose threshold does not depend on the unknown upper bound H, is asymptotically minimax. We draw connections between our work and the literature on density estimation, entropy estimation, total variation distance (l1 divergence) estimation, joint distribution estimation in stochastic processes, normal mean estimation, and adaptive estimation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call