Sample Quantiles Research Articles

The probability is determined that the sample distribution function (df) of a random sample of any size $n$, drawn from the uniform distribution, lies below a given line segment of any slope over some $(\alpha, \beta) \subset \lbrack 0, 1\rbrack$ (Section 1, Theorem 1.1 ff.). Probabilities of related events, also under conditioning, are derived. It is well known that results of this type are equivalent to similar ones for random samples from any continuous df. A catalogue of equivalent formulae is given, the various versions being advantageous on certain ranges of the parameters. These results rest upon and generalize a formula (Theorem 2.1 below) of Dempster (1959), Dwass (1959), and Pyke (1959) (his Lemma 1), which gives an explicit expression for any $n$ that the sample df lies below some (straight) line extended over the entire unit interval. Otherwise the proof uses familiar properties of order statistics, the whole argument being essentially a combinatorial one. The present result also generalizes, in particular, results by Wald and Wolfowitz (1939), by Birnbaum and Tingey (1951) (in both papers sample df below a line segment of slope 1 over [0,1]; especially the latter paper, which improves the first, is at the root of the approach of the present article as well as of the papers by Dempster, Dwass, Pyke), by Smirnov (1944 and 1961) (sample df below a line segment with slope 1 over (0,1) or $(\alpha, 1)$), by Chang (1955) (line segment of any nonnegative slope joining the origin with some point in the open unit square $I_2^0$), Csorgo (1965) (line segment of any slope may also end in the point (1, 1)), Birnbaum and Lientz (1969) (line segment of any nonnegative slope through the origin over any subinterval of the unit interval). Apparently the first author who determined explicitly the probability that the sample df lies below a line segment over an arbitrary subinterval $(\alpha, \beta)\subset\lbrack 0, 1\rbrack$ was Takacs (1964) (Theorem 3; see also Takacs (1967), pages 176-178). (The author is indebted to Professor J. Kiefer for reminding him of this reference.) However, he had the (not very crucial) restriction that the slope $\gamma$ of the line be $\geqq 1$. Moreover, his formula contains a double sum, whereas some of ours contain a single one. This fact proves to be of great advantage in the applications we have made so far. Theorem 1.2 below gives certain conditional probabilities that the sample df lies below a line segment over some subinterval of (0,1). These probability expressions (as any of those described above), for a suitable sequence of line segments depending on $n$ tend to the probability that the Brownian bridge (or conditioned Wiener) process lies below a line segment, as $n \rightarrow \infty$ (see [18], in particular pages 182-183). This follows from the Doob-Donsker theorem on weak convergence of probability measures (see also [13]). These asymptotic results provide a starting point different from and possibly simpler than that of Section 1 to compute approximate probabilities that the sample df lies below some curved line (for an application and some of the asymptotic formulae compare [7] and [17]). It is the objective of a later paper to determine the error of the approximation. Theorem 1.3 gives similar conditional probability expressions that the sample probability (a generalization of the sample df, defined below) for intervals of variable lengths lies below a line segment. It is these formulae that are needed in a new version of the proof of the Bahadur-Kiefer representation theorem for sample quantiles [1]. The formulae derived here are explicit (though involved) in contrast to recursive ones first considered in [12] and extended, more recently, e.g., in [9] and [25]. They have been used, e.g., to compute explicitly the probability that the sample df lies below a polygon (compare a forthcoming paper of the author). Another application is the derivation of asymptotic formulae paralleling and generalizing, e.g., those of Renyi and Csorgo ([4] and [10]). Implications for stochastic processes (such as those studied in [19] and for the theory of goodness of fit tests (compare [1]) are not considered. In [1] also the statistic $F_n(x) - F(x)$, divided by its standard deviation, has been proposed. Certain functionals of this statistic may be studied by the method of the present paper. On the other hand, the method is not immediately applicable to the study of probabilities that the sample df lies, e.g., between two lines (for a recent paper on this compare [16]). The vast existing literature on Kolmogorov-Smirnov type statistics (for recent surveys compare [20] and the appendix of [12a]) is not being surveyed for possible applications other than the few above-mentioned representative examples.

Read full abstract

Common estimates of multivariate location parameters have the property that each component of the parameter is estimated using only the corresponding component of the observations. This is true of the sample mean, sample median and the vector of medians of averages (studied in [1]), as well as of the rank-order statistics often applied to testing for location. In some cases, particularly the multivariate normal, such estimators achieve asymptotic efficiency, but in general information is lost. This paper presents three methods of estimating multivariate location parameters which use more information than is available in the marginal distributions. These classes of estimators are asymptotically nearly efficient (ANE), in the sense that for every $\epsilon > 0$ there is an estimator in the class with asymptotic efficiency $> 1 - \epsilon$ (if efficiency is measured by a comparison of the asymptotic covariance matrix to the inverse of the information matrix). Our ANE estimators are motivated by those of Ogawa [6] for univariate location parameters. Ogawa obtained the asymptotically minimum-variance asymptotically unbiased estimator (ABLUE) for location or scale from a chosen set of sample quantiles. It was soon observed (Tischendorf [10]) that the reciprocal of the asymptotic variance of Ogawa's estimator (properly normalized) is essentially a Riemann sum for the information integral for the parameter being estimated. Thus under mild regularity conditions the ABLUE approaches asymptotic efficiency as larger sets of more closely spaced quantiles are chosen for use. Ogawa's estimators are therefore ANE for univariate location parameters. In the present paper we describe three classes of ANE estimators for multivariate location parameters. The first two consist of linear estimators, and represent multivariate generalizations of Ogawa's ANE class. Our three classes are as follows: (1) Choose a set of marginal sample quantiles in each direction from a continuous $r$-variate location parameter distribution. These quantiles generate a random partition of Euclidean $r$-space $R_r$, and for $r > 1$ the observed cell frequencies contain additional information. We obtain the ABLUE's in terms of the sample quantiles and the observed cell frequencies for $r = 2$ and show that they are ANE. (2) For all $r > 1$, linear ANE estimators are obtained by choosing a single sample quantile in each direction and partitioning $R_r$ by marking off fixed distances from these. The ABLUE's in terms of the $r$ chosen quantiles and the observed cell frequencies are ANE. These estimators have much simpler coefficients than do those of class (1). (3) Finally, ANE estimators can be obtained by exploiting analytic properties of RBAN estimators (Neyman [5]) for a sequence of multinomial problems related to the given location parameter family. These estimators are usually not expressed in closed form. Ogawa proceeded by applying least squares theory to the asymptotic distribution of his chosen set of sample quantiles. The ABLUE's of classes (1) and (2) are here derived by the same method, but establishing the joint asymptotic distribution of the marginal sample quantiles and the observed cell frequencies is non-trivial. Our method is to reduce the problem to one involving the multinomial distribution. A similar idea was used by Weiss [11] to obtain the joint asymptotic distribution of the quantiles alone, but the present problem requires more elaborate arguments. Section 2 contains a preliminary result for the multinomial distribution. The ANE classes (1) and (2) are discussed in Section 3, while Section 4 presents the third class. Estimators of all three classes will require use of a computer if the distribution function cannot be expressed in closed form, and may therefore be of limited practical usefulness. Section 5, however, contains an example for which estimators of classes (1) and (2) can be computed with relative ease. For this example, a bivariate logistic distribution, the performance of our estimators is compared with that of the sample mean and median and of Ogawa's univariate ANE estimators. Throughout, $K$ denotes a generic positive constant, $\mathscr{L}\{X\}$ is the probability law of the random variable $X$, and $\mathscr{L}\{X_n\} \rightarrow \mathscr{L}\{X\}$ designates convergence in law. $N(\mu, \Sigma)$ is the normal law with mean $\mu$ and covariance matrix $\Sigma$ (which may be $1 \times 1$).

Read full abstract

Sample Quantiles Research Articles

Related Topics

Articles published on Sample Quantiles

On the Probability that a Sample Distribution Function Lies Below a Line Segment

Linear Estimation of the Location and Scale Parameters of the Cauchy Distribution Based on Sample Quantiles

Linear Estimation of the Normal Distribution Standard Deviation

Linear Estimation of the Location and Scale Parameters of the Cauchy Distribution Based on Sample Quantiles

Confidence intervals for the parameters of the logistic distribution

Estimation of the Parameters of the Logistic Distribution by Sample Quantiles

The asymptotic joint distribution of an increasing number of sample quantiles

Asymptotically Nearly Efficient Estimators of Multivariate Location Parameters

Estimation of the parameters of the logistic distribution by sample quantiles

Estimation of the parameters of the extreme value distribution by use of two or three order statistics

Testing the Mean and Standard Deviation of a Normal Distribution Using Quantiles

Asymptotic Normality of Sample Quantiles for $m$-Dependent Processes

Analysis of Extreme-Value Data by Sample Quantiles for Very Large Samples

Analysis of Extreme-Value Data by Sample Quantiles for Very Large Samples

On the Uniform Convergences of the Distributions of Normed Sample Quantiles

An Inequality for Expected Values of Sample Quantiles

On Bahadur's Representation of Sample Quantiles

Estimation of the Parameters of the Exponential Distribution Based on Optimum Order Statistics in Censored Samples

Systematic Statistics Used for Data Compression in Space Telemetry

Systematic Statistics Used for Data Compression in Space Telemetry

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sample Quantiles Research Articles

Related Topics

Articles published on Sample Quantiles

On the Probability that a Sample Distribution Function Lies Below a Line Segment

Linear Estimation of the Location and Scale Parameters of the Cauchy Distribution Based on Sample Quantiles

Linear Estimation of the Normal Distribution Standard Deviation

Linear Estimation of the Location and Scale Parameters of the Cauchy Distribution Based on Sample Quantiles

Confidence intervals for the parameters of the logistic distribution

Estimation of the Parameters of the Logistic Distribution by Sample Quantiles

The asymptotic joint distribution of an increasing number of sample quantiles

Asymptotically Nearly Efficient Estimators of Multivariate Location Parameters

Estimation of the parameters of the logistic distribution by sample quantiles

Estimation of the parameters of the extreme value distribution by use of two or three order statistics

Testing the Mean and Standard Deviation of a Normal Distribution Using Quantiles

Asymptotic Normality of Sample Quantiles for $m$-Dependent Processes

Analysis of Extreme-Value Data by Sample Quantiles for Very Large Samples

Analysis of Extreme-Value Data by Sample Quantiles for Very Large Samples

On the Uniform Convergences of the Distributions of Normed Sample Quantiles

An Inequality for Expected Values of Sample Quantiles

On Bahadur's Representation of Sample Quantiles

Estimation of the Parameters of the Exponential Distribution Based on Optimum Order Statistics in Censored Samples

Systematic Statistics Used for Data Compression in Space Telemetry

Systematic Statistics Used for Data Compression in Space Telemetry