Abstract

Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and contrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We derive an explicit finite-sample risk bound for regularization-based estimators that simultaneously accounts for (i) the structure of the ambient function space, (ii) the regularity of the true regression function, and (iii) the adaptability (or qualification) of the regularization. A simple consequence of this upper bound is that the risk of the regularization-based estimators matches the minimax rate in a variety of settings. The general bound also illustrates how some regularization techniques are more adaptable than others to favorable regularity properties that the true regression function may possess. This, in particular, demonstrates a striking difference between kernel ridge regression and kernel principal component regression. Our theoretical results are supported by numerical experiments.

Highlights

  • Suppose that the observed data consists of zi =, i = 1, . . . , n, where yi ∈ Y ⊆ R and xi ∈ X ⊆ Rd

  • F †(x) − f(x) 2 dρX (x) where the expectation is computed over z1, . . . , zn, and · ρX denotes the norm on L2(ρX ); we seek estimators fwhich minimize Rρ(f). This is a version of the random design nonparametric regression problem

  • We focus on regularization and kernel methods for estimating f †

Read more

Summary

Introduction

One consequence of the theorem is that the regularization methods studied in this paper (including KRR and KPCR) achieve the minimax rate for estimating f † in a variety of settings. A second consequence is that certain regularization methods (including KPCR, but not KRR) may adapt to favorable regularity of f † to attain even faster convergence rates, while others (notably KRR) are limited in this regard due to a well-known saturation effect (Neubauer, 1997; Mathe, 2005; Bauer et al, 2007) This illustrates a striking advantage that KPCR may have over KRR in these settings

Related work
Statistical setting and assumptions
Regularization
Finite-rank operators of interest
Basic definitions
Estimators
General bound on the risk
Implications for kernels characterized by their eigenvalues’ rate of decay
Parametric rates for finite-dimensional kernels and subspaces
Simulated data
Real data
Discussion
Bias-variance decomposition
Translation to vector and matrix notation
Bias bound
Variance bound
Finishing the proof of Theorem 1
Sums of random operators
Differences of powers of bounded operators
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call