Abstract

We give a probabilistic analysis of a phenomenon in statistics which, until recently, has not received a convincing explanation. This phenomenon is that the leading principal components tend to possess more predictive power for a response variable than lower-ranking ones despite the procedure being unsupervised. Our result, in its most general form, shows that the phenomenon goes far beyond the context of linear regression and classical principal components --- if an arbitrary distribution for the predictor $X$ and an arbitrary conditional distribution for $Y \vert X$ are chosen then any measureable function $g(Y)$, subject to a mild condition, tends to be more correlated with the higher-ranking kernel principal components than with the lower-ranking ones. The ``arbitrariness'' is formulated in terms of unitary invariance then the tendency is explicitly quantified by exploring how unitary invariance relates to the Cauchy distribution. The most general results, for technical reasons, are shown for the case where the kernel space is finite dimensional. The occurency of this tendency in real world databases is also investigated to show that our results are consistent with observation.

Highlights

  • Kernel principal component analysis ([38], [39]) is one of the most widely used methods for unsupervised dimension reduction

  • A core feature of this method is its use of the “kernel trick”. This trick implies that if an operation depends only on inner products a lower dimensional nonlinear projection of the data can be extracted without dealing directly with the projection coefficients. This versatile idea appears in other settings, such as the support vector machine and, more recently, sufficient dimension reduction

  • In this paper, that higher-ranking kernel principal components have the tendency to be more informative of the response than lower-ranking ones

Read more

Summary

Introduction

Kernel principal component analysis ([38], [39]) is one of the most widely used methods for unsupervised dimension reduction. P th principal components of the p-dimensional random vector X and let Corr(A, B) = Cov(A, B)/ Var(A)Var(B) be the correlation of real random variables A and B. They showed that, under some assumptions on β and Σ, the following holds when i < j. This refined the empirical estimate of 0.65 computed by simulation in [30] These results confirm theoretically that principal components, though they are derived without reference to any response, do tend to contain some information about the response with the higher-ranking ones tending to have more predictive value than the lower-ranking ones.

Some empirical evidence
Technical construction of KPCA
Predictive power of KPCA with finite-dimensional kernels
Unitarily invariant random functions and operators
Nonparametric regression case
Arbitrary conditional distribution case
Predictive power of KPCA with infinite-dimensional kernels
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.