Abstract

Principal component analysis (PCA) has long been a useful and important tool for dimension reduction. However, this method must be used with care under certain circumstances such as high dimension and small sample size. In general, low dimension with large sample size or large signal to noise ratio is vital to guarantee the consistency of the leading eigenvalues and eigenvectors obtained by PCA. Cross data matrix (CDM)-based PCA is another way to estimate PCA components, through splitting data into two subsets and calculating singular value decomposition for the cross product of the corresponding covariance matrices. It has been shown that CDM-based PCA has a broader region of consistency than ordinary PCA for leading eigenvalues and eigenvectors. Although the difference in regions of consistency is well studied, an interesting practical as well as theoretical question is how they differ in eigenvalues and eigenvectors estimation, especially for the case where both fall in a common region of consistency. In this article, we derive the finite sample approximation results as well as the asymptotic behavior for CDM-based PCA via matrix perturbation. Furthermore, we also derive a comparison measure for CDM-based PCA vs. ordinary PCA. This measure only depends on the data dimension, noise correlations and the noise-to-signal ratio (NSR). Using this measure, we develop an algorithm, which selects good partitions and integrates results from these good partitions to form a final estimate for CDM-based PCA. Numerical and real data examples are presented for illustration.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.