Abstract

SUMMARY Suppose one has several normal populations, identically distributed except for their means. At stage one a sample of size n1 is taken from each population. At stage two, a sample of size n2 is taken from the two populations producing the largest means in stage one and the population having the largest cumulative mean selected as best. For the least favourable configuration of means an algorithm is developed for calculating po' the probability of 'correct' selection. The technique involves using a finite representation of the standard normal distribution, counting methods and the use of a high-speed computer both for enumeration and for later smoothing and filtering. A two-stage procedure for the selection of the population with the largest mean from a set of normal populations with unknown means and a common known variance was proposed by Somerville (1954) and extended by Fairweather (1968). The procedure eliminates a predetermined number of populations after the first stage and, from the survivors, selects after the second stage the one with the largest cumulative mean. Costs from incorrect selection and from sampling are assumed known. For a given ratio of first-stage to second-stage sample sizes, the total sample size which would minimize the maximum expected loss was derived. The maximum was taken over all possible configurations of the true population means. For a wide range of losses due to incorrect selection and for a wide range of procedures, one-stage, two-stage, etc., for selecting the 'best' population, viz. the one with the largest mean, it was shown that the maximum expected loss occurs when the means of all the populations except the best are equal. We describe this as the 'least favourable configuration.' Under this configuration of means, a considerable portion of the effort in Somerville (1954) and Fairweather (1968) was devoted to the problem of the evaluation of po, defined to be the probability of a correct selection, selecting the best population. For example, Fairweather showed that the determination of po involved the evaluation of the cumulative distribution function of a multivariate normal variable with at least five different values in the variance-covariance matrix. The dimension of the multivariate integral is 2k - t, where k + 1 is the number of populations and t is the number eliminated after the first stage. The amount of work involved in the computation of po even for small values of k is seen to be very large and increases exponentially with k. Fairweather's computations were thus limited to the case of four populations. Curnow & Dunnett (1962) have shown that any n-variate normal cumulative distribution function can always be written as a single integral with an (n - 1)-variate normal cumulative distribution function in the integrand, with the integration extending over a singly infinite

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call