The kernel PCA algorithms for wide data. Part I: Theory and algorithms

W Wu,D.L Massart,S De Jong

doi:10.1016/s0169-7439(97)00010-5

Abstract

Four classic PCA algorithms: NIPALS, the power method (POWER), singular value decomposition (SVD) and eigenvalue decomposition (EVD) are modified into their kernel version to analyse wide data sets. For such data sets with many variables and fewer objects, the classic algorithms become very inefficient, because the size of the associated matrix X′ - X ( p × p) is very large. Based on the kernel matrix X·X′ ( n × n), the kernel algorithms are developed. They yield the same principal components but, when the number of variables is higher than the number of objects, they are faster than the corresponding classic algorithms. Simulation results confirm this property and also show that the kernel EVD is the most efficient algorithm for wide data sets, and that the difference between the kernel SVD and EVD is not very large. Therefore, it is recommended to use the kernel EVD for wide data sets whether all PCs or only the first few are required. A Matlab PCA program which efficiently deals with all kinds of data sets was developed.

Full Text