Probability binning and testing agreement between multivariate immunofluorescence histograms: extending the chi-squared test.

Keith A Baggerly

doi:10.1002/1097-0320(20011001)45:2<141::aid-cyto1156>3.0.co;2-m

Abstract

A key problem in immunohistochemistry is assessing when two sample histograms are significantly different. One test that is commonly used for this purpose in the univariate case is the chi-squared test. Comparing multivariate distributions is qualitatively harder, as the "curse of dimensionality" means that the number of bins can grow exponentially. For the chi-squared test to be useful, data-dependent binning methods must be employed. An example of how this can be done is provided by the "probability binning" method of Roederer et al. (1,2,3). We derive the theoretical distribution of the probability binning statistic, giving it a more rigorous foundation. We show that the null distribution is a scaled chi-square, and show how it can be related to the standard chi-squared statistic. A small simulation shows how the theoretical results can be used to (a) modify the probability binning statistic to make it more sensitive and (b) suggest variant statistics which, while still exploiting the data-dependent strengths of the probability binning procedure, may be easier to work with. The probability binning procedure effectively uses adaptive binning to locate structure in high-dimensional data. The derivation of a theoretical basis provides a more detailed interpretation of its behavior and renders the probability binning method more flexible.

Full Text