Learning k-Modal Distributions via Testing

Constantinos Daskalakis,Rocco A Servedio,Ilias Diakonikolas

doi:10.4086/toc.2014.v010a020

Abstract

$ \newcommand{\eps}{\epsilon} \newcommand{\poly}{\mathrm{poly}} \newcommand{\wh}[1]{{\widehat{#1}}} $ A $k$-modal probability distribution over the discrete domain $\{1,...,n\}$ is one whose histogram has at most $k$ “peaks” and “valleys.” Such distributions are natural generalizations of monotone ($k=0$) and unimodal ($k=1$) probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of learning (i.e., performing density estimation of) an unknown $k$-modal distribution with respect to the $L_1$ distance. The learning algorithm is given access to independent samples drawn from an unknown $k$-modal distribution $p$, and it must output a hypothesis distribution $\widehat{p}$ such that with high probability the total variation distance between $p$ and $\widehat{p}$ is at most $\eps.$ Our main goal is to obtain computationally efficient algorithms for this problem that use (close to) an information-theoretically optimal number of samples. We give an efficient algorithm for this problem that runs in time $\poly(k,\log(n),1/\eps)$. For $k \leq \tilde{O}( {\log n})$, the number of samples used by our algorithm is very close (within an $\tilde{O}(\log(1/\eps))$ factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases $k=0,1$ (Birgé 1987, 1997). A novel feature of our approach is that our learning algorithm crucially uses a new algorithm for property testing of probability distributions as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the $k$-modal distribution into $k$ (near-)monotone distributions, which are easier to learn. A preliminary version of this work appeared in the Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2012).

Highlights

This paper considers a natural unsupervised learning problem involving k-modal distributions over the discrete domain [n] ={1, . . . , n}
A distribution is k-modal if the plot of its probability density function has at most k “peaks” and “valleys”. Such distributions arise both in theoretical and applied research; they naturally generalize the simpler classes of monotone (k = 0) and unimodal (k = 1) distributions that have been intensively studied in probability theory and statistics
Our main aim in this paper is to give an efficient algorithm for learning an unknown k-modal distribution p to total variation distance ε, given access only to independent samples drawn from p

Summary

Introduction

This paper considers a natural unsupervised learning problem involving k-modal distributions over the discrete domain [n] ={1, . . . , n}. This paper considers a natural unsupervised learning problem involving k-modal distributions over the discrete domain [n] ={1, . A distribution is k-modal if the plot of its probability density function (pdf) has at most k “peaks” and “valleys” (see Section 2.1 for a precise definition). Our main aim in this paper is to give an efficient algorithm for learning an unknown k-modal distribution p to total variation distance ε, given access only to independent samples drawn from p. Our main contribution in this paper is a computationally efficient algorithm that has nearly optimal sample complexity for small (but super-constant) values of k

Background and relation to previous work

Our results

Our approach

Discussion

Notation and problem statement

Basic tools

Learning k-modal distributions

Warm-up: A simple learning algorithm

Main result

Algorithm Learn-kmodal and its analysis

Testing whether a k-modal distribution is monotone

Conclusions and future work

A Birgé’s algorithm as a semi-agnostic learner

B Hypothesis testing

C Using the hypothesis tester

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Theory of Computing	Publication Date: Jan 1, 2014
Citations: 15	License type: cc-by

R Discovery Prime

R Discovery Prime

Learning k-Modal Distributions via Testing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Theory of Computing

Lead the way for us

Similar Papers

Learning k-Modal Distributions via Testing
Constantinos Daskalakis ... Ilias Diakonikolas
-
Constantinos Daskalakis, et. al.Constantinos Daskalakis ... Ilias Diakonikolas
17 Jan 2012
17 Jan 2012

Learning k-modal distributions via testing
...
-
, et. al. ...
17 Jan 2012
17 Jan 2012

Sublinear algorithms for testing monotone and unimodal distributions
Tugkan Batu ... Ronitt Rubinfeld
-
Tugkan Batu, et. al.Tugkan Batu ... Ronitt Rubinfeld
13 Jun 2004
13 Jun 2004

Testing k-Modal Distributions: Optimal Algorithms via Reductions
Constantinos Daskalakis ... Paul Valiant
-
Constantinos Daskalakis, et. al.Constantinos Daskalakis ... Paul Valiant
06 Jan 2013
06 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning k-Modal Distributions via Testing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Theory of Computing