Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing

Hwanjo Yu,Xiaolei Li,Jiong Yang,Jiawei Han

doi:10.1007/s10618-005-0005-7

Abstract

Support vector machines (SVMs) have been promising methods for classification and regression analysis due to their solid mathematical foundations, which include two desirable properties: margin maximization and nonlinear classification using kernels. However, despite these prominent properties, SVMs are usually not chosen for large-scale data mining problems because their training complexity is highly dependent on the data set size. Unlike traditional pattern recognition and machine learning, real-world data mining applications often involve huge numbers of data records. Thus it is too expensive to perform multiple scans on the entire data set, and it is also infeasible to put the data set in memory. This paper presents a method, Clustering-Based SVM (CB-SVM), that maximizes the SVM performance for very large data sets given a limited amount of resource, e.g., memory. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples. These samples carry statistical summaries of the data and maximize the benefit of learning. Our analyses show that the training complexity of CB-SVM is quadratically dependent on the number of support vectors, which is usually much less than that of the entire data set. Our experiments on synthetic and real-world data sets show that CB-SVM is highly scalable for very large data sets and very accurate in terms of classification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing

Abstract

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery

Lead the way for us

Journal: Data Mining and Knowledge Discovery	Publication Date: Aug 19, 2005
Citations: 86

Similar Papers

Classifying large data sets using SVMs with hierarchical clusters
Hwanjo Yu ... Jiong Yang
-
Hwanjo Yu, et. al.Hwanjo Yu ... Jiong Yang
24 Aug 2003
24 Aug 2003

Ensemble Learning with Support Vector Machines for Bond Rating

-

01 Jan 2012
01 Jan 2012

Data-Driven Machine Learning Approach for Predicting Missing Values in Large Data Sets: A Comparison Study
Ogerta Elezaj ... Sule Yildirim
-
Ogerta Elezaj, et. al.Ogerta Elezaj ... Sule Yildirim
21 Dec 2017
21 Dec 2017

Contributions to k-means clustering and regression via classification algorithms
...
-
, et. al. ...
12 Jul 2014
12 Jul 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing

Abstract

Talk to us

Similar Papers

More From: Data Mining and Knowledge Discovery