Large-scale data classification based on hierarchical clustering and re-sampling

Yong Zhang,Panpan Fu,Yuting Zhang

doi:10.3724/sp.j.1087.2013.02801

Abstract

Based on hierarchical clustering and re-sampling,this paper presented a Support Vector Machine(SVM) classification method for large-scale data,which combined supervised learning with unsupervised learning.The proposed method first used k-means cluster analytical technology to partition dataset into several subsets.Then,the method clustered class by class for each subset and selected samples in each clustering center neighborhood to form candidate training datasets.Last,the method applied SVM to train and model for candidate training datasets.The experimental results show that the proposed method can substantially reduce SVM learning cost.Meanwhile,the proposed method has better classification accuracy than random re-sampling method,and can attain about the same classification accuracy of the non-sampling method.

Full Text