Abstract

The increased interest in the variability of soil properties is responsible for recent observations that soil variables are not normally distributed but are more closely approximated by the two-parameter lognormal frequency distribution. Statistical methods commonly applied in the estimation of the median of lognormally distributed data, however, are biased or inefficient. The purpose of this study was to evaluate four statistical methods for estimating, from sample data, the median of a lognormal population. The four statistical methods were: (i) the geometric mean (GM), (ii) a bias-corrected form of the geometric mean (BCGM), (iii) a uniformly minimum variance unbiased (UMVU) estimator, and (iv) the sample median (SM). In addition, two techniques for computing confidence limits about the median were evaluated. Monte Carlo simulations from four different lognormal populations were used in these evaluations to determine the efficacy of these methods as a function of both population variance and sample size (n = 4-100). Results of this work indicate that the UMVU estimator and the BCGM estimators are unbiased and yield estimates with the lowest mean square error. An example is provided that illustrates the application of these techniques. M of the complex environmental questions faced by society require more precise quantification of environmental variables and processes. Once complicating factor in such environmental studies is the high degree of spatial variability often exhibited by natural variables. Automated data collection and analysis instrumentation has enabled investigators to collect large data sets in an attempt to deal with the problems of high variability. These developments have enabled better determination of frequency distributions. Many environmental variables exhibit skewed frequency distributions that can be approximated by the lognormal distribution. Unlike symmetric distributions in which the mean and median have the same value, with nonsymmetric distributions, such as the lognormal distribution, the mean and median have different values. When such distributions occur, a choice exists concerning the summary statistic of interest. In a study of epiphytic bacterial populations on leaf surfaces, the median was chosen as the relevant summary statistic (Hirano et al., 1982). The geometric mean has also been used as a measure of central tendency for populations of bacteria in the rhizosphere (Loper et al., 1984) and for bacterial populations in aquatic environments (Greenberg et al., 1985). In contrast, it has been suggested that, for quantification of denitrification N loss from soils, the mean is a more appropriate estimator than the median (Parkin, 1991). For skewed data the choice of the appropriate summary statistic is important, as it influences the outT.B. Parkin, USDA-ARS National Soil Tilth Lab., 2150 Pammel Dr., Ames, IA 50011; and J.A. Robinson, 7922-190-MR, the Upjohn Co., Kalamazoo, MI 49001. Received 31 Jan. 1991 'Corresponding author. Published in Soil Sci. Soc. Am. J. 57:317-323 (1993). come of statistical tests and, therefore, data interpretation (Parkin et al., 1987; Parkin, 1991). A detailed discussion of the power of statistical tests in detecting differences in the mean vs. the median as well as a discussion of criteria for selection of the mean vs. the median is presented elsewhere (Parkin, 1991, 1993; Parkin and Robinson, 1992). In cases where the population median is the appropriate summary statistic, it is important to accurately estimate this quantity from the data. The choice of the optimum estimator is not the only consideration: confidence intervals for the estimator must also be computed. In previous studies we reported on methods for estimating the mean, variance, and coefficient of variation for lognormally distributed variables (Parkin et al., 1988) as well as on methods for computing confidence limits for the lognormal mean (Parkin et al., 1990). This study extends those findings by evaluating several methods of estimating the population median and confidence limits for the median of a log-normally distributed variable. We report on four methods of estimating the population median from sample data and two techniques for computing confidence limits of the median. These methods were evaluated using four lognormal distributions and across a range of sample sizes representative of those commonly observed in studies of environmental variables.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call