Abstract

Independent component analysis (ICA) recently has attracted much attention in the statistical literature as an appealing alternative to elliptical models. Whereas k-dimensional elliptical densities depend on one single unspecified radial density, however, k-dimensional independent component distributions involve k unspecified component densities. In practice, for given sample size n and dimension k, this makes the statistical analysis much harder. We focus here on the estimation, from an independent sample, of the mixing/demixing matrix of the model. Traditional methods (FOBI, Kernel-ICA, FastICA) mainly originate from the engineering literature. Their consistency requires moment conditions, they are poorly robust, and do not achieve any type of asymptotic efficiency. When based on robust scatter matrices, the two-scatter methods developed by Oja, Sirkia, and Eriksson in 2006 and Nordhausen, Oja, and Ollila in 2008 enjoy better robustness features, but their optimality properties remain unclear. The “classical semiparametric” approach by Chen and Bickel in 2006, quite on the contrary, achieves semiparametric efficiency, but requires the estimation of the densities of the k unobserved independent components. As a reaction, an efficient (signed-)rank-based approach was proposed by Ilmonen and Paindaveine in 2011 for the case of symmetric component densities. The performance of their estimators is quite good, but they unfortunately fail to be root-n consistent as soon as one of the component densities violates the symmetry assumption. In this article, using ranks rather than signed ranks, we extend their approach to the asymmetric case and propose a one-step R-estimator for ICA mixing matrices. The finite-sample performances of those estimators are investigated and compared to those of existing methods under moderately large sample sizes. Particularly good performances are obtained from a version involving data-driven scores taking into account the skewness and kurtosis of residuals. Finally, we show, by an empirical exercise, that our methods also may provide excellent results in a context such as image analysis, where the basic assumptions of ICA are quite unlikely to hold. Supplementary materials for this article are available online.

Highlights

  • 1.1 Independent Component Analysis (ICA)The traditional Gaussian model for noise, where a k-dimensional error term e is N (0, Σ) can be extended, mainly, into two directions

  • Either the elliptical density contours of the multinormal are preserved, and e is assumed to be elliptically symmetric with respect to the origin, with unspecified radial density f

  • This makes the statistical analysis of models based on independent component noise significantly harder than its elliptical counterpart

Read more

Summary

Introduction

The traditional Gaussian model for noise, where a k-dimensional error term e is N (0, Σ) can be extended, mainly, into two directions. Under appropriate assumptions on the component densities, the resulting estimators are root-n consistent and, when based on robust scatter matrices, the method can be seen as a robustification of FOBI. That method is exploiting the consistency properties of such estimators as FOBI or FastICA, or those based on the two-scatter method, and taking into account the invariance and distribution-freeness features of ranks in order to bypass the costly step of estimating k densities. Their estimators—call them R+-estimators—achieve semiparametric efficiency at some selected k-tuple of component densities, and yield very good finite-sample performances, even under moderately large samples. Another solution to those identification problems is adopted by Chen and Bickel (2006), who impose scaling restrictions of f , and let their PCFICA algorithm (Chen and Bickel (2005)) make a choice between the various observationally equivalent values of Λ−1

Group Invariance and semiparametric efficiency
Rank-based versions of central sequences
R-estimation of the mixing matrix
One-step R-estimation
Consistent estimation of cross-information quantities
Data-driven specification of reference density
Simulations
The preliminary estimators
The R-estimators
Simulation settings
An application in image analysis
A Supplemental material
Findings
B Supplemental material: further simulation results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call