Abstract

In this work, we develop inferential tools for determining the correct number of principal components under a general noisy latent variable model, which includes as a special case, for example, the noisy independent component model. The problem is approached using hypothesis testing, and we provide both a large‐sample test and several resampling‐based alternatives. Simulations and an application to sound data reveal that both types of approaches keep the desired levels and have good power.

Highlights

  • Many modern high-dimensional data sets are too large to be directly subjected to various methods of multivariate data analysis. This is especially common with independent component analysis (ICA; Comon & Jutten, 2010; Nordhausen & Oja, 2018), where methods often have computational complexities that grow too fast with the number of variables p to be of any practical use

  • In Algorithm 1, we propose two different ways of sampling from a spherical distribution: (a) replacing the vectors R ⊤xi with independent and identically distributed Gaussian vectors, which can be seen as parametric bootstrapping; and (b) spherifying the original R ⊤xi by random orthogonal transformations

  • We developed inference tools to detect the true latent signal dimensionality using principal component analysis (PCA)

Read more

Summary

INTRODUCTION

Many modern high-dimensional data sets are too large to be directly subjected to various methods of multivariate data analysis This is especially common with independent component analysis (ICA; Comon & Jutten, 2010; Nordhausen & Oja, 2018), where methods often have computational complexities that grow too fast with the number of variables p to be of any practical use. The use of the preliminary PCA step is very popular in applied data analysis, no inferential methods for the selection of a suitable dimension under a general noisy model can be found in the literature. Similar studies include Luo and Li (2016) and Nordhausen, Oja, Tyler, and Virta (2017) which provide tools to test for the number of non-Gaussian components in an ICA model with internal noise. These techniques were extended in Matilainen, Nordhausen, and Virta (2018), Nordhausen and Virta (2018), and Virta and Nordhausen (2019) to the second-order source separation model to test for white noise

NOISY ICA MODEL
ASYMPTOTIC NULL DISTRIBUTION
BOOTSTRAP NULL DISTRIBUTION
NUMERICAL EXAMPLES
CONCLUSION
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call