Kernels: Regularization and Optimization

Cheng Soon Ong

doi:10.25911/5d7a2861e2e73

Abstract

This thesis extends the paradigm of machine learning with kernels. This paradigm is based on the idea of generalizing an inner product between vectors to a similarity measure between objects. The kernel implicitly defines a feature mapping between the space of objects and the space of functions, called the reproducing kernel Hilbert space. There have been many successful applications of positive semidefinite kernels in diverse fields. Among the reasons for its success are a theoretically motivated regularization method and efficient algorithms for optimizing the resulting problems. Since the kernel has to effectively capture the domain knowledge in an application, we study the problem of learning the kernel itself from training data. The proposed solution is a kernel on the space of kernels itself, which we called a hyperkernel. This provides a method for regularization via the norm of the kernel. We show that for several machine learning tasks, such as binary classification, regression and novelty detection, the resulting optimization problem is a semidefinite program. We solve the corresponding optimization problems using the same parameter settings across all problems, and demonstrate that we have further automated machine learning methods. We observe that the restriction for kernels to be positive semidefinite can be removed. The non-positive kernels, called indefinite kernels, have corresponding functional theory, and define reproducing kernel Krĕin spaces. We derive machine learning problems with indefinite kernels and prove the representer theorem as well as generalization error bounds. We provide theoretical and experimental evidence to support the idea of regularization by early stopping of conjugate gradient type algorithms. Conjugate gradient type algorithms are iterative methods that generate solutions in Krylov subspaces, and exhibit semi-convergence. We analyse the sequence of Krylov subspaces that determine the associated filter function on the spectrum of the inverse problem, and quantitatively investigate semi-convergence. These algorithms are then used for machine learning with indefinite kernels.

Full Text