Abstract
We develop a test statistic for testing the equality of two population mean vectors in the “large-p-small-n” setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling T2 test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme dataset from The Cancer Genome Atlas are carried out to illustrate the test. Supplementary materials for this article are available online.
Highlights
In many applications it is desirable to test whether the means of high-dimensional random vectors are the same in two populations
The following conditions are assumed in deriving the asymptotic distribution of the test statistic Tn
The following theorem establishes the asymptotic normality of the test statistic under the appropriate centering and scaling
Summary
In many applications it is desirable to test whether the means of high-dimensional random vectors are the same in two populations. Bai & Saranadasa (1996) presented a test statistic which uses only the trace of the sample covariance matrix and performs well when the random vectors of each population can be expressed as linear transformations of zero-mean i.i.d. random vectors with identity covariance matrices. Dense-but-weak signal settings do exist, for example in the analysis of copy number variations, where mildly elevated or reduced numbers of DNA segment copies in cancer patients are believed to occur over regions of the chromosome rather than at isolated points (Olshen et al (2004), Baladandayuthapani et al (2010)) It is for such cases that our test is designed. Full details for the proofs may be found in the Supplementary Material
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have