Let fλ be the regularized solution for the problem of estimating a function or vector f0 from noisy data yi = Lif0 + εi, i = 1, …, n, where Li are linear functionals. A prominent method for the selection of the crucial regularization parameter λ is generalized cross-validation (GCV). It is known that GCV has good asymptotic properties as n → ∞ but it may not be reliable for small or medium sized n, sometimes giving an estimate that is far too small. We propose a new robust GCV method (RGCV) which chooses λ to be the minimizer of γV(λ) + (1 − γ)F(λ), where V(λ) is the GCV function, F(λ) is an approximate average measure of the influence of each data point on fλ and γ ∊ (0, 1) is a robustness parameter. We show that for any n, RGCV is less likely than GCV to choose a very small value of λ, resulting in a more robust method. We also show that RGCV has good asymptotic properties as n → ∞ for general linear operator equations with uncorrelated errors. The function EF(λ) approximates the risk ER(λ) for values of λ that are asymptotically a bit smaller than the minimizer of ER(λ) (where V(λ) may not approximate well). The ‘expected’ RGCV estimate is asymptotically optimal as n → ∞ with respect to the ‘robust risk’ γER(λ) + (1 − γ)v(λ), where v(λ) is the variance component of the risk, and it has the optimal decay rate with respect to ER(λ) and stronger error criteria. The GCV and RGCV methods are compared in numerical simulations for the problem of estimating the second derivative from noisy data. The results for RGCV with n = 51 are consistent with the asymptotic results, and, for a large range of γ values, RGCV is more reliable and accurate than GCV.
Read full abstract