Kernel Density Estimation, Kernel Methods, and Fast Learning in Large Data Sets

Shitong Wang Shitong Wang,Jun Wang Jun Wang,Fu-Lai Chung Fu-Lai Chung

doi:10.1109/tsmcb.2012.2236828

Shitong Wang Shitong Wang, Jun Wang Jun Wang + Show 1 more

https://doi.org/10.1109/tsmcb.2012.2236828

Copy DOI

Abstract

Kernel methods such as the standard support vector machine and support vector regression trainings take O(N(3)) time and O(N(2)) space complexities in their naïve implementations, where N is the training set size. It is thus computationally infeasible in applying them to large data sets, and a replacement of the naive method for finding the quadratic programming (QP) solutions is highly desirable. By observing that many kernel methods can be linked up with kernel density estimate (KDE) which can be efficiently implemented by some approximation techniques, a new learning method called fast KDE (FastKDE) is proposed to scale up kernel methods. It is based on establishing a connection between KDE and the QP problems formulated for kernel methods using an entropy-based integrated-squared-error criterion. As a result, FastKDE approximation methods can be applied to solve these QP problems. In this paper, the latest advance in fast data reduction via KDE is exploited. With just a simple sampling strategy, the resulted FastKDE method can be used to scale up various kernel methods with a theoretical guarantee that their performance does not degrade a lot. It has a time complexity of O(m(3)) where m is the number of the data points sampled from the training set. Experiments on different benchmarking data sets demonstrate that the proposed method has comparable performance with the state-of-art method and it is effective for a wide range of kernel methods to achieve fast learning in large data sets.

Full Text