Abstract

We develop a test statistic for testing the equality of two population mean vectors in the “large-p-small-n” setting. Such a test must surmount the rank-deficiency of the sample covariance matrix, which breaks down the classic Hotelling T2 test. The proposed procedure, called the generalized component test, avoids full estimation of the covariance matrix by assuming that the p components admit a logical ordering such that the dependence between components is related to their displacement. The test is shown to be competitive with other recently developed methods under ARMA and long-range dependence structures and to achieve superior power for heavy-tailed data. The test does not assume equality of covariance matrices between the two populations, is robust to heteroscedasticity in the component variances, and requires very little computation time, which allows its use in settings with very large p. An analysis of mitochondrial calcium concentration in mouse cardiac muscles over time and of copy number variations in a glioblastoma multiforme dataset from The Cancer Genome Atlas are carried out to illustrate the test. Supplementary materials for this article are available online.

Highlights

  • In many applications it is desirable to test whether the means of high-dimensional random vectors are the same in two populations

  • The following conditions are assumed in deriving the asymptotic distribution of the test statistic Tn

  • The following theorem establishes the asymptotic normality of the test statistic under the appropriate centering and scaling

Read more

Summary

Introduction

In many applications it is desirable to test whether the means of high-dimensional random vectors are the same in two populations. Bai & Saranadasa (1996) presented a test statistic which uses only the trace of the sample covariance matrix and performs well when the random vectors of each population can be expressed as linear transformations of zero-mean i.i.d. random vectors with identity covariance matrices. Dense-but-weak signal settings do exist, for example in the analysis of copy number variations, where mildly elevated or reduced numbers of DNA segment copies in cancer patients are believed to occur over regions of the chromosome rather than at isolated points (Olshen et al (2004), Baladandayuthapani et al (2010)) It is for such cases that our test is designed. Full details for the proofs may be found in the Supplementary Material

Test Statistic
Main Results
Technical Details
Simulation Studies
Performance under normality
Effect of skewness
Effect of heavy-tailedness
Effect of heteroscedasticity
Effect of unequal covariance matrices
Copy Number Variation Example
Mitochondrial Calcium Concentration
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call