Abstract

Abstract Compression of training sets is a technique for reducing training set size without degrading classifi-cation accuracy. By reducing the size of a training set, training will be more efficient in addition tosaving storage space. In this paper, an incremental clustering algorithm, the Leader algorithm, is usedto reduce the size of a training set by effectively subsampling the training set. Experiments on sev-eral standard data sets using SVM and KNN as classifiers indic ate that the proposed method is moreefficient than CONDENSE in reducing the size of training set w ithout degrading the classificationaccuracy. While the compression ratio for the CONDENSE method is fixed, the proposed methodoffers variable compression ratio through the cluster threshold value.Keywords: Clustering, Support vector machine, KNN, Pattern recognition, CONDENSE. 1. Introduction The training and/or testing complexity of a classifier usual ly depends on the size of the trainingset, e.g. the nearest neighbor (NN) classifier [1]. Nearest n eighbor and its generalized form, theK-nearest neighbor (KNN) classifier, are among the most popular non-parametric classifiers. Themembership of an unknown sample is classified based on the maj ority vote of the K nearest neigh-bors. There is no explicit learning from the training set. The entire training set itself defines thedecision boundaries. It is conceptually simple and shows good performance in many applications,e.g. it was used in face recognition for visitor identificati on [2] and it was shown that it outper-formed more sophisticated algorithms that use Principal Components Analysis (PCA) and neuralnetworks. Unfortunately, when the size of the training set is high, it requires a lot of memory tostore the entire training set and it also takes longer to search for the nearest neighbors of a given testpattern to make a single membership classification. Obvious ly, reducing the size of a training setcan improve the space and time efficiency of KNN. There has bee n considerable interest in reducingthe training set size by editing, especially in the context of NN. Different proximity graphs (such asDelaunay triangulation) may be used for editing NN rules [3,4]. Complexities of such approachesare prohibitively high. For example, the Voronoi diagram has worst case complexity ofΘ(n

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call