Abstract

Researchers often use a two-step process to analyze multivariate data. First, dimensionality is reduced using a technique such as principal component analysis, followed by a group comparison using a -test or analysis of variance. Although this practice is often discouraged, the statistical properties of this procedure are not well understood, starting with the hypothesis being tested. We suggest that this approach might be considering two distinct hypotheses, one of which is a global test of no differences in the mean vectors, and the other being a focused test of a specific linear combination where the coefficients have been estimated from the data. We study the asymptotic properties of the two-sample -statistic for these two scenarios, assuming a nonsparse setting. We show that the size of the global test agrees with the presumed level but that the test has poor power. In contrast, the size of the focused test can be arbitrarily distorted with certain mean and covariance structures. A simple method is provided to correct the size of the focused test. Data analyses and simulations are used to illustrate the results. Recommendations on the use of this two-step method and the related use of principal components for prediction are provided.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call