Abstract

Many linear dimension reduction methods proposed in the literature can be formulated using an appropriate pair of scatter matrices. The eigen-decomposition of one scatter matrix with respect to another is then often used to determine the dimension of the signal subspace and to separate signal and noise parts of the data. Three popular dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI) and sliced inverse regression (SIR) are considered in detail and the first two moments of subsets of the eigenvalues are used to test for the dimension of the signal space. The limiting null distributions of the test statistics are discussed and novel bootstrap strategies are suggested for the small sample cases. In all three cases, consistent test-based estimates of the signal subspace dimension are introduced as well. The asymptotic and bootstrap tests are illustrated in real data examples.

Highlights

  • Dimension reduction (DR) plays an increasingly important role in high dimensional data analysis

  • In the independent component analysis (ICA) the fourth-order blind identification (FOBI) by Cardoso [6] uses the regular covariance matrix and the scatter matrix based on fourth moments and the eigenvalues provide measures of marginal kurtosis

  • For the principal component analysis (PCA)-I strategy applied to the covariance matrix, similar arguments can be used to get the same approximations for the distributions of n(p − k)Tk(X ∗)/(2d2) and n(p − k)Tk(X )/(2d2)

Read more

Summary

Introduction

Dimension reduction (DR) plays an increasingly important role in high dimensional data analysis. In the independent component analysis (ICA) the fourth-order blind identification (FOBI) by Cardoso [6] uses the regular covariance matrix and the scatter matrix based on fourth moments and the eigenvalues provide measures of marginal kurtosis. Other examples on supervised dimension reduction methods are the canonical correlation analysis (CCA), sliced average variance estimate (SAVE) and principal Hessian directions (PHD), for example, and they all can be formulated using two scatter matrices. For these methods and estimation of the dimension of the signal subspace, with regular bootstrap sampling, see Li [23], Cook and Weisberg [9], Li [24], Bura and Cook [4], Cook [8], Zhu et al [46,47], Bura and. The distribution of x is elliptical if x ∼ Az +b, where A ∈ Rp×p and b ∈ Rp and z ∈ R has a spherical distribution

Scatter matrices
Testing for subspace dimension in PCA
Asymptotic tests for dimension
Bootstrap tests for dimension
An example
Testing for subspace dimension in FOBI
Testing for subspace dimension in SIR
A bootstrap test for dimension
Final remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call