Testing k-Modal Distributions: Optimal Algorithms via Reductions

Constantinos Daskalakis,Rocco A Servedio,Paul Valiant,Ilias Diakonikolas,Gregory Valiant

doi:10.1137/1.9781611973105.131

Abstract

Previous chapter Next chapter Full AccessProceedings Proceedings of the 2013 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)Testing k-Modal Distributions: Optimal Algorithms via ReductionsConstantinos Daskalakis, Ilias Diakonikolas, Rocco A. Servedio, Gregory Valiant, and Paul ValiantConstantinos DaskalakisMITUC BerkeleyColumbia UniversityMicrosoftBrown University*Research supported by NSF CAREER award CCF-0953960 and by a Sloan Foundation Fellowship.Search for more papers by this author, Ilias DiakonikolasMITUC BerkeleyColumbia UniversityMicrosoftBrown University†Research supported by a Simons Foundation Postdoctoral Fellowship. Some of this work was done while at Columbia University, supported by NSF grant CCF-0728736, and by an Alexander S. Onassis Foundation Fellowship.Search for more papers by this author, Rocco A. ServedioMITUC BerkeleyColumbia UniversityMicrosoftBrown University‡Supported by NSF grants CCF-0347282 and CCF-0523664.Search for more papers by this author, Gregory ValiantMITUC BerkeleyColumbia UniversityMicrosoftBrown University§Supported by an NSF graduate research fellowship and an IBM PhD Fellowship.Search for more papers by this author, and Paul ValiantMITUC BerkeleyColumbia UniversityMicrosoftBrown University¶Supported by an NSF postdoctoral research fellowship.Search for more papers by this authorpp.1833 - 1852Chapter DOI:https://doi.org/10.1137/1.9781611973105.131PDFBibTexSections ToolsAdd to favoritesDownload CitationsTrack CitationsEmail SectionsAboutAbstract We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems that involve testing and estimating the L1 (total variation) distance between two k-modal distributions p and q over the discrete domain {1, …, n}. More precisely, we consider the following four problems: given sample access to an unknown k-modal distribution p, Testing identity to a known or unknown distribution: 1. Determine whether p = q (for an explicitly given k-modal distribution q) versus p is e-far from q; 2. Determine whether p = q (where q is available via sample access) versus p is ε-far from q; Estimating L1 distance (“tolerant testing”) against a known or unknown distribution: 3. Approximate dTV(p, q) to within additive ε where q is an explicitly given k-modal distribution q; 4. Approximate dTV (p, q) to within additive ε where q is available via sample access. For each of these four problems we give sub-logarithmic sample algorithms, and show that our algorithms have optimal sample complexity up to additive poly (k) and multiplicative polylog log n + polylogk factors. Our algorithms significantly improve the previous results of [BKR04], which were for testing identity of distributions (items (1) and (2) above) in the special cases k = 0 (monotone distributions) and k = 1 (unimodal distributions) and required O((log n)3) samples. As our main conceptual contribution, we introduce a new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way. Roughly speaking, this approach enables us to transform various distribution testing problems for k-modal distributions over {1, …, n} to the corresponding distribution testing problems for unrestricted distributions over a much smaller domain {1, …, ℓ} where ℓ = O(k log n). Previous chapter Next chapter RelatedDetails Published:2013ISBN:978-1-61197-251-1eISBN:978-1-61197-310-5 https://doi.org/10.1137/1.9781611973105Book Series Name:ProceedingsBook Code:PR143Book Pages:xix + 1915

Full Text