Abstract

Kernel methods are attractive in data analysis as they can model nonlinear similarities between observations and provide means to rich representations, both of which are useful for the regression problems in general domains. Despite their popularity, they suffer from two primary inherent drawbacks. One drawback is the positive definiteness requirement of the kernel functions, which greatly restricts their applications to some real data analysis. The other drawback is their poor scalability in massive data scenarios. In this paper, we aim to address these two problems by considering the Nyström subsampling approach for coefficient-based regularized regression (or Nyström CRR for short). Nyström subsampling is an effective approach to reduce the space and time complexity by constructing a low-rank approximation of the original kernel matrix through column subsampling. Coefficient-based regularized regression can provide a simple paradigm for designing indefinite kernel methods. We show that a combination of these two schemes is not only computationally efficient but also statistically consistent with a mini-max optimal rates of convergence. We explicitly determine the lower bound of subsampling level as a function of the sample size such that the mini-max optimal convergence rates can be preserved. From our analysis, the subsampling level plays a role as a trade-off between computational and asymptotic behaviors of Nyström CRR, and hence is pivotal for algorithmic performances. In order to choose an appropriate subsampling level, we develop an incremental Nyström CRR for nested subsampling sets. The proposed algorithm can greatly reduce the cost of cross-validation, and even allows to compute the whole solution path through all possible subsampling levels. Based on our empirical studies, the incremental Nyström CRR can perform effective model selections and achieve the state-of-the-art results on both synthetic and real data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call