Constructing Kernel Machines in the Empirical Kernel Feature Space

Huilin Xiong,Zhongli Jiang

doi:10.5772/17960

Huilin Xiong, Zhongli Jiang

Open Access

PDF Available

https://doi.org/10.5772/17960

Copy DOI

Export

Save

Cite

Publication Date: Jul 27, 2011

License type: cc-by-nc-sa

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Over the last decade, kernel-based nonlinear learningmachines, e.g., support vector machines (SVMs) Vapnik (1995), kernel principal component analysis (KPCA) Scholkopf (1998), and kernel Fisher discriminant analysis (KFDA) Mika (1999), attracted a lot of attentions in the fields of pattern recognition and machine learning, and have been successfully applied in many real-world applications Mika (1999); Yang (2002); Lu (2003); Yang (2004). Basically, the kernel-based learning methods work by mapping the input data space, X , into a high dimensional space, F , called the kernel feature space: Φ : X −→ F , and then building linear machines in the kernel feature space to implement their nonlinear counterparts in the input space. This procedure is also known as a “kernelization”, in which the so-called kernel trick is associated in such a way that the inner product of each pair of the mapped data in the kernel feature space is calculated by a kernel function, rather than explicitly using the nonlinear map, Φ. The kernel trick provides an easy way to kernelize linear machines. However, in many cases, formulating a kernel machine via the kernel trick could be difficult and even impossible. For example, it is pretty tough to formulate the kernel version of the direct disciminant analysis algorithm (KDDA) Lu (2003) using the kernel trick. Moreover, for some recently developed linear discriminant analysis schemes, such as the uncorrelated linear discriminant analysis (ULDA) Ye (2004), and the orthogonal linear discriminant analysis (OLDA) Ye (2005), which have been shown to be efficient in many real-world applications Ye (2004), it is impossible to directly kernelize them via the kernel trick, since these schemes need first computing the singular value decomposition (SVD) of an interim matrix, namely, Ht (see Ye (2004)), which is generally of infinite column size in the case of the kernel feature space. Theoretically, the kernel feature space is generally an infinite dimensional Hilbert space. However, given a training data set {xi} (i = 1, 2, . . . , n), the kernel machines we known perform actually in a subspace of the kernel feature space, spanΦ(xi) (i = 1, 2, . . . , n), which can be embedded into a finite-dimensional Euclidean space with all data’s geometrical measurements, e.g., distance and angle, being preservedXiong (2005). This finite-dimensional embedding space, called empirical kernel feature space, provides a unified framework for kernelizing all kinds of linear machines. With this framework, kernel machines can be 10

Full Text