A two-dimensional Kolmogorov-Smirnov test

Raul Lopes

doi:10.22323/1.050.0045

Abstract

Goodness-of-fit statistics measure the compatibility of random samples against some theoretical probability distribution function. The classical one-dimensional Kolmogorov-Smirnov test is a non-parametric statistic for comparing two empirical distributions which defines the largest absolute difference between the two cumulative distribution functions as a measure of disagreement. Adapting this test to more than one dimension is a challenge because there are 2d −1 independent ways of defining a cumulative distribution function when d dimensions are involved. In this paper three variations on the Kolmogorov-Smirnov test for multi-dimensional data sets are surveyed: Peacock’s test [1] that computes in O(n3); Fasano and Franceschini’s test [2] that computes in O(n2); Cooke’s test that computes in O(n2). We prove that Cooke’s algorithm runs in O(n2), contrary to his claims that it runs in O(n lgn). We also compare these algorithms with ROOT’s version of the Kolmogorov-Smirnov test.

Full Text