Scalable Kernel Methods via Doubly Stochastic Gradients

Bo Dai ,Yingyu Liang ,Bo Xie ,Anant Raj ,Maria-Florina Balcan ,Niao He ,Le Song

doi:10.1184/r1/6476315.v1

Abstract

The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called doubly stochastic functional gradients. Based on the fact that many kernel methods can be expressed as convex optimization problems, our approach solves the optimization problems by making two unbiased stochastic approximations to the functional gradient—one using random training points and another using random features associated with the kernel—and performing descent steps with this noisy functional gradient. Our algorithm is simple, need no commit to a preset number of random features, and allows the flexibility of the function class to grow as we see more incoming data in the streaming setting. We demonstrate that a function learned by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O(1/t), and achieves a generalization bound of O(1/√t). Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show competitive performances of our approach as compared to neural nets in datasets such as 2.3 million energy materials from MolecularSpace, 8 million handwritten digits from MNIST, and 1 million photos from ImageNet using convolution features.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Scalable Kernel Methods via Doubly Stochastic Gradients

Abstract

Talk to us

Similar Papers

More From: neural information processing systems

Lead the way for us

Journal: neural information processing systems	Publication Date: Dec 8, 2014
Citations: 56

Similar Papers

Kernel and Linear Adaptive Methods for the BRAN Channels Identification
Rachid Fateh ... Said Safi
-
Rachid Fateh, et. al.Rachid Fateh ... Said Safi
01 Jan 2021
01 Jan 2021

RFN: A Random-Feature Based Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Spaces
Ting-Jui Chang ... Shahin Shahrampour
IEEE Transactions on Signal Processing | VOL. 70
Ting-Jui Chang, et. al.Ting-Jui Chang ... Shahin Shahrampour
01 Jan 2021
IEEE Transactions on Signal Processing | VOL. 70

Global Convergence of Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Space
Ting-Jui Chang ... Shahin Shahrampour
-
Ting-Jui Chang, et. al.Ting-Jui Chang ... Shahin Shahrampour
01 Nov 2020
01 Nov 2020

Kernel methods are competitive for operator learning
Pau Batlle ... Houman Owhadi
Journal of Computational Physics | VOL. 496
Pau Batlle, et. al.Pau Batlle ... Houman Owhadi
16 Oct 2023
Journal of Computational Physics | VOL. 496

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scalable Kernel Methods via Doubly Stochastic Gradients

Abstract

Talk to us

Similar Papers

More From: neural information processing systems