Abstract

Principal component analysis (PCA) aims at estimating the direction of maximal variability of a high-dimensional data set. A natural question is: does this task become easier, and estimation more accurate, when we exploit additional knowledge on the principal vector? We study the case in which the principal vector is known to lie in the positive orthant. Similar constraints arise in a number of applications, ranging from the analysis of gene expression data to spike sorting in neural signal processing. In the unconstrained case, the estimation performances of PCA have been precisely characterized using the random matrix theory, under a statistical model known as the spiked model. It is known that the estimation error undergoes a phase transition as the signal-to-noise ratio crosses a certain threshold. Unfortunately, tools from the random matrix theory have no bearing on the constrained problem. Despite this challenge, we develop an analogous characterization in the constrained case, within a one-spike model. In particular: 1) we prove that the estimation error undergoes a similar phase transition, albeit at a different thresholds in signal-to-noise ratio that we determine exactly; 2) we prove that-unlike in the unconstrained case-the estimation error depends on the spike vector, and characterize the least favorable vectors; and 3) we show that a non-negative principal component can be approximately computed-under the spiked model-in nearly linear time. This despite the fact that the problem is non-convex and, in general, NP-hard to solve exactly.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.