Abstract

We consider the following problem. For a given population of m items, we have to make a decision whether or not the population includes a relatively large cluster of identical items. This decision affects the effectiveness of a subsequent computational process, depending on the actual existence of the cluster and its size. To make a good decision, we use a statistical sample which should indicate the existence of a cluster and find a representative thereof. This paper describes the optimal sampling technique to be used in such a case, given the cost of the sampling and the potential gain in speed of the subsequent process. The optimal fixed sample size is specified, as well as the optimal sequential sampling, along with characterizing the dependence of the cost function on the truncation point.For the case that the a priori distribution of the cluster proportion is known, we present formulae by which the optimal sampling procedures can be easily calculated. For the common situation in which the a priori distribution is not known, we present, in the case of a fixed sample size, a tight upper bound for the sample size, which is independent of the a priori distribution, and for the case of the sequential sampling, we present an approximately optimal truncation point, which is also independent of the a priori distribution.The situation described arose in connection with choosing the best sorting method, an application that will be described in full detail. The most interesting practical result is that for our application truncating the sequential procedure at 35 observations, out of a population of 25,000–30,000 items, guarantees that in our sorting application we are always within 2.1% of the optimal cost independently of the a priori distribution.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.