Abstract
An extensive literature surrounds the distribution of the standardized sample mean discrepancy in sampling from arbitrary populations. The subject has practical interest because the assumption of an underlying normal population rarely holds, and there is good reason to have an idea, for a given sample size, how the usual lower and upper tail probabilities are affected by sampling from a diversity of parent populations. Attacks on the subject have included (a) simulation studies for small samples (n = 5-10), from Pearson, Johnson S,, and series distributions (see for example Pearson and Please 1975), (b) exact results (so-called) for samples from an Edgeworth population including terms involving skewness and kurtosis parameters (Gayen, 1949), and from a normal mixture (Lee and Gurland, 1973), (c) exact results for sets of sampled populations, each satisfying some mathematical regularity condition (Bradley, 1952; Sansing and Owen, 1974). We offer some critical comments on these and other contributions, which for the most part pre-date the facilities of modern computers. To set the stage, let it be noted that except for very small samples (Perlo, 1933, samples of three for example) exact results are not known, if exact is interpreted in the strictest sense. A series expansion for the distribution function of t we do not regard as exact unless there is a fairly simple expression for the remainder for I t I in some interval.Even a proof or demonstration of convergence is irrelevant if we are taking a hard look at accuracy. Similarly, the inclusion of an asymptotic order of magnitude term whilst of interest mathematically is almost useless otherwise, and certainly meaningless if we wish to pinpoint assessments for small to moderate sample sizes. Thus, it is extremely difficult to carry out error analysis for the variety of populations sampled for various sample sizes and probability levels. This is not to say the studies are without value; in most cases they involve sophisticated mathematical manipulations and at least have brought to light the main effects of non-normality (in fact, to quote Geary on his own tabulations of modifications in one and two sample t distributions, 'It should be remarked that the probabilities in Table 3 (as well as in Table 2) are merely rough approximations the samples used are far too small for the results to have any pretention to accuracy. The object has been merely to show that the actual probability could be considerably at variance with that shown in the standard table, for small samples.' Thus, it is generally accepted that lower tail probabilities are enhanced (upper tail diminished) by positive skewness. In any event, in general, large samples dilute this property whereas extreme probability levels may involve drastic modifications. The general problem is perhaps beyond exact assessment by mathematical analysis, and approaches found in numerical analysis may be the answer. For example, numerical quadrature in practice relies for error assessment on comparisons over different procedures rather than manipulating an awkward error term (which indeed may be just as tiresome as the original integral). Thus, comparisons of appropriate quadratures (trapezoidal, Simpson, Romberg
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Statistical Review / Revue Internationale de Statistique
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.