The probability is determined that the sample distribution function (df) of a random sample of any size $n$, drawn from the uniform distribution, lies below a given line segment of any slope over some $(\alpha, \beta) \subset \lbrack 0, 1\rbrack$ (Section 1, Theorem 1.1 ff.). Probabilities of related events, also under conditioning, are derived. It is well known that results of this type are equivalent to similar ones for random samples from any continuous df. A catalogue of equivalent formulae is given, the various versions being advantageous on certain ranges of the parameters. These results rest upon and generalize a formula (Theorem 2.1 below) of Dempster (1959), Dwass (1959), and Pyke (1959) (his Lemma 1), which gives an explicit expression for any $n$ that the sample df lies below some (straight) line extended over the entire unit interval. Otherwise the proof uses familiar properties of order statistics, the whole argument being essentially a combinatorial one. The present result also generalizes, in particular, results by Wald and Wolfowitz (1939), by Birnbaum and Tingey (1951) (in both papers sample df below a line segment of slope 1 over [0,1]; especially the latter paper, which improves the first, is at the root of the approach of the present article as well as of the papers by Dempster, Dwass, Pyke), by Smirnov (1944 and 1961) (sample df below a line segment with slope 1 over (0,1) or $(\alpha, 1)$), by Chang (1955) (line segment of any nonnegative slope joining the origin with some point in the open unit square $I_2^0$), Csorgo (1965) (line segment of any slope may also end in the point (1, 1)), Birnbaum and Lientz (1969) (line segment of any nonnegative slope through the origin over any subinterval of the unit interval). Apparently the first author who determined explicitly the probability that the sample df lies below a line segment over an arbitrary subinterval $(\alpha, \beta)\subset\lbrack 0, 1\rbrack$ was Takacs (1964) (Theorem 3; see also Takacs (1967), pages 176-178). (The author is indebted to Professor J. Kiefer for reminding him of this reference.) However, he had the (not very crucial) restriction that the slope $\gamma$ of the line be $\geqq 1$. Moreover, his formula contains a double sum, whereas some of ours contain a single one. This fact proves to be of great advantage in the applications we have made so far. Theorem 1.2 below gives certain conditional probabilities that the sample df lies below a line segment over some subinterval of (0,1). These probability expressions (as any of those described above), for a suitable sequence of line segments depending on $n$ tend to the probability that the Brownian bridge (or conditioned Wiener) process lies below a line segment, as $n \rightarrow \infty$ (see [18], in particular pages 182-183). This follows from the Doob-Donsker theorem on weak convergence of probability measures (see also [13]). These asymptotic results provide a starting point different from and possibly simpler than that of Section 1 to compute approximate probabilities that the sample df lies below some curved line (for an application and some of the asymptotic formulae compare [7] and [17]). It is the objective of a later paper to determine the error of the approximation. Theorem 1.3 gives similar conditional probability expressions that the sample probability (a generalization of the sample df, defined below) for intervals of variable lengths lies below a line segment. It is these formulae that are needed in a new version of the proof of the Bahadur-Kiefer representation theorem for sample quantiles [1]. The formulae derived here are explicit (though involved) in contrast to recursive ones first considered in [12] and extended, more recently, e.g., in [9] and [25]. They have been used, e.g., to compute explicitly the probability that the sample df lies below a polygon (compare a forthcoming paper of the author). Another application is the derivation of asymptotic formulae paralleling and generalizing, e.g., those of Renyi and Csorgo ([4] and [10]). Implications for stochastic processes (such as those studied in [19] and for the theory of goodness of fit tests (compare [1]) are not considered. In [1] also the statistic $F_n(x) - F(x)$, divided by its standard deviation, has been proposed. Certain functionals of this statistic may be studied by the method of the present paper. On the other hand, the method is not immediately applicable to the study of probabilities that the sample df lies, e.g., between two lines (for a recent paper on this compare [16]). The vast existing literature on Kolmogorov-Smirnov type statistics (for recent surveys compare [20] and the appendix of [12a]) is not being surveyed for possible applications other than the few above-mentioned representative examples.
Read full abstract