Abstract
Model selection is one of the fundamental problems in kernel-based algorithms, which is commonly done by minimizing an estimation of generalization error. The notion of stability and cross-validation (CV) error of learning machines consists of two widely used tools for analyzing the generalization performance. However, there are some disadvantages to both tools when applied for model selection: 1) the stability of learning machines is not practical due to the difficulty of the estimation of its specific value and 2) the CV-based estimate of generalization error usually has a relatively high variance, so it is prone to overfitting. To overcome these two limitations, we present a novel notion of kernel stability (KS) for deriving the generalization error bounds and variance bounds of CV and provide an effective approach to the application of KS for practical model selection. Unlike the existing notions of stability of the learning machine, KS is defined on the kernel matrix; hence, it can avoid the difficulty of the estimation of its value. We manifest the relationship between the KS and the popular uniform stability of the learning algorithm, and further propose several KS-based generalization error bounds and variance bounds of CV. By minimizing the proposed bounds, we present two novel KS-based criteria that can ensure good performance. Finally, we empirically analyze the performance of the proposed criteria on many benchmark data, which demonstrates that our KS-based criteria are sound and effective.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have