Abstract
Conventional principal component analysis is highly susceptible to outliers. In particular, a sufficiently outlying single data point, can draw the leading principal component toward itself. In this paper, we study the effects of outliers for high dimension and low sample size data, using asymptotics. The non-robust nature of conventional principal component analysis is verified through inconsistency under multivariate Gaussian assumptions with a single spike in the covariance structure, in the presence of a contaminating outlier. In the same setting, the robust method of spherical principal components is consistent with the population eigenvector for the spike model, even in the presence of contamination.
Highlights
Principal components analysis (PCA) is widely used for high dimensional data (Jolliffe [1]), including high dimension, low sample size (HDLSS) data
Both the sample mean and covariance are sensitive to outlying observations, and so classical PCA tends to be unreliable in the presence of outliers
Centering using the L1 M-estimate is recommended (Locantore et al [6]), because that is intuitively consistent with spherical PCA
Summary
Principal components analysis (PCA) is widely used for high dimensional data (Jolliffe [1]), including high dimension, low sample size (HDLSS) data. Devlin and Gnanadesikan [2] did an eigen analysis of a robust estimate of the covariance matrix to develop a robust version of PCA. The asymptotic behavior of classical PCA for HDLSS data has been established by Jung and Marron [8] under various versions of the spike eigenvalue model, with one or only a few large eigenvalues (Johnstone and Silverman [9]). They explored conditions under which the conventional PCA was consistent in terms of the spike parameter α. Robustness with respect to outliers and SPCA are for the first time studied rigorously in the HDLSS asymptotic context
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.