Abstract

Principal component analysis (PCA) is known to be sensitive to outliers, so that various robust PCA variants were proposed in the literature. A recent model, called reaper, aims to find the principal components by solving a convex optimization problem. Usually the number of principal components must be determined in advance and the minimization is performed over symmetric positive semi-definite matrices having the size of the data, although the number of principal components is substantially smaller. This prohibits its use if the dimension of the data is large which is often the case in image processing. In this paper, we propose a regularized version of reaper which enforces the sparsity of the number of principal components by penalizing the nuclear norm of the corresponding orthogonal projector. If only an upper bound on the number of principal components is available, our approach can be combined with the L-curve method to reconstruct the appropriate subspace. Our second contribution is a matrix-free algorithm to find a minimizer of the regularized reaper which is also suited for high-dimensional data. The algorithm couples a primal-dual minimization approach with a thick-restarted Lanczos process. This appears to be the first efficient convex variational method for robust PCA that can handle high-dimensional data. As a side result, we discuss the topic of the bias in robust PCA. Numerical examples demonstrate the performance of our algorithm.

Highlights

  • Principal component analysis (PCA) [37] realizes the dimensionality reduction in data by projecting them onto those affine subspace which minimizes the sum of the squared Euclidean distances between the data points and their projections

  • PCA is very sensitive to outliers, so that various robust approaches were developed in robust statistics [17,28,46] and nonlinear optimization

  • While robust PCA models that can handle high-dimensional data are usually non-convex, a convex relaxation was proposed by the reaper model

Read more

Summary

Introduction

Principal component analysis (PCA) [37] realizes the dimensionality reduction in data by projecting them onto those affine subspace which minimizes the sum of the squared Euclidean distances between the data points and their projections. Related approaches such as [7,32,51] separate the low-rank component from the column sparse one using different norms in the variational model Another group of robust PCA replaces the squared L2 norm in the PCA model by the L1 norm [18]. Instead of the previous model which minimizes over the sparse number of directions spanning the low-dimensional subspace, it is possible to minimize over the orthogonal projectors onto the desired subspace This has the advantage that the minimization can be performed over symmetric positive semi-definite matrices, e.g., using methods from semi-definite programming, and the disadvantage that the dimension of the projectors is as large as the data now.

Notation and Preliminaries
Regularized REAPER
Primal-Dual Algorithm
Matrix-Free Realization
Matrix-Free Primal Update
Matrix-Free Dual Update
Matrix-Free Projection onto the Orthoprojectors
Matrix-Free Robust PCA by RREAPER
Performance Analysis
Incorporating the Offset
Numerical Examples
Nuclear Norm and Truncated Hypercube Constraints
Choosing the Parameters
Face Approximation
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call