Abstract

A Survey on Distribution Testing: Your Data is Big. But is it Blue?

Highlights

  • Given data from an experiment, study or population, inferring information from the underlying probability distribution it defines is a fundamental problem in Statistics and data analysis, and has applications and ramifications in a myriad of other fields

  • It may be possible to overcome the formidable complexity of the task; most of the time at the price of a slightly relaxed guarantee on the answer. (For a more eloquent exposition of these points, see, e. g., [88]1.) But if only one phrase and motivation was allowed to justify the whole field of distribution testing, the author could not find anything more concise and trendy than these two words: “Big Data.”

  • In Part III, we focus on the standard model for distribution testing, where the algorithm can only access the distribution by drawing independent samples from it

Read more

Summary

Introduction

Given data from an experiment, study or population, inferring information from the underlying probability distribution it defines is a fundamental problem in Statistics and data analysis, and has applications and ramifications in a myriad of other fields. This question, extensively studied for decades, has undergone a significant shift these last years: the amount of data has grown huge, and the corresponding distributions are often over a very large domain (see for instance [18, 60, 69]). We work in the setting of property testing as originally introduced in [90, 59], where access to an unknown “huge object” is presented to an algorithm via the ability to perform local “inspections.” By making only a small number of such queries to the object, the randomized algorithm must determine whether the object exhibits some prespecified property of interest, or is far from every object with the property. (For a more detailed presentation and overview of the field of property testing, the reader is referred to [53, 86, 87, 56].)

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call