Multiscale principal component analysis

A A Akinduko,A N Gorban

doi:10.1088/1742-6596/490/1/012081

Abstract

Principal component analysis (PCA) is an important tool in exploring data. The conventional approach to PCA leads to a solution which favours the structures with large variances. This is sensitive to outliers and could obfuscate interesting underlying structures. One of the equivalent definitions of PCA is that it seeks the subspaces that maximize the sum of squared pairwise distances between data projections. This definition opens up more flexibility in the analysis of principal components which is useful in enhancing PCA. In this paper we introduce scales into PCA by maximizing only the sum of pairwise distances between projections for pairs of datapoints with distances within a chosen interval of values [l,u]. The resulting principal component decompositions in Multiscale PCA depend on point (l,u) on the plane and for each point we define projectors onto principal components. Cluster analysis of these projectors reveals the structures in the data at various scales. Each structure is described by the eigenvectors at the medoid point of the cluster which represent the structure. We also use the distortion of projections as a criterion for choosing an appropriate scale especially for data with outliers. This method was tested on both artificial distribution of data and real data. For data with multiscale structures, the method was able to reveal the different structures of the data and also to reduce the effect of outliers in the principal component analysis.

Highlights

In 1901, Pearson proposed approximating high dimensional data with lines and planes and invented the Principal Component Analysis (PCA)
One of the definitions of Principal component analysis (PCA) is that PCA finds subspaces that maximize the sum of point-to-point squared distances between the orthogonal projections of data points to them
We introduce the Multiscale PCA (MPCA) algorithm to enhance the robustness of the PCA especially in revealing hidden structure(s) that may be present in dataset but which the conventional approach might not reveal

Summary

Introduction

In 1901, Pearson proposed approximating high dimensional data with lines and planes and invented the Principal Component Analysis (PCA). One of the definitions of PCA is that PCA finds subspaces (lines, planes or higher dimensional subspaces) that maximize the sum of point-to-point squared distances between the orthogonal projections of data points to them. We will use in the definition of multiscale PCA maximization of the sum of point-to-point squared distances between the orthogonal projections of data points for the pairs of points with distances in some intervals. The result of this is PCA decomposition of the data which. We proposed a criterion for determining the appropriate scale for computing the principal components for data with outliers

Weighted PCA

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Mar 11, 2014
Citations: 9	License type: cc-by

R Discovery Prime

R Discovery Prime

Multiscale principal component analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Author response: Limitations of principal components in quantitative genetic association models for human studies
Yiqi Yao ... Alejandro Ochoa
-
Yiqi Yao, et. al.Yiqi Yao ... Alejandro Ochoa
25 Apr 2023
25 Apr 2023

Decision letter: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg ... Detlef Weigel
-
Magnus Nordborg, et. al.Magnus Nordborg ... Detlef Weigel
04 Jul 2022
04 Jul 2022

Editor's evaluation: Limitations of principal components in quantitative genetic association models for human studies
Magnus Nordborg
-
Magnus NordborgMagnus Nordborg
04 Jul 2022
04 Jul 2022

Author response: Sparse dimensionality reduction approaches in Mendelian randomisation with highly correlated exposures
Vasileios Karageorgiou ... Dipender Gill
-
Vasileios Karageorgiou, et. al.Vasileios Karageorgiou ... Dipender Gill
28 Nov 2022
28 Nov 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiscale principal component analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series