Comparison of scaling behavior between fuzzy c-means based classifier with many parameters and LibSVM

Akira Notsu,Katsuhiro Honda,Hidetomo Ichihashi

doi:10.1109/fuzzy.2011.6007352

Abstract

This paper reports the scaling behavior of the fuzzy c-means based classifier (FCMC) with many parameters. FCMC is a classifier based on clustering approaches. The classification accuracy on test sets (i.e., the generalization capability) is not necessarily improved by increasing the number of clusters. Especially when the number of training samples is relatively small, not only the classification boundary over-fits the data, but also covariance matrices and cluster centers are computed incorrectly, since the number of samples in each cluster becomes smaller. Hence, the test set accuracy deteriorates. The performance of FCMC with two clusters in each class and the number of training samples less than 1000, was reported in the literature. This paper reports the scaling behavior of FCMC by testing with variously-sized training samples. The number of clusters of FCMC is increased up to eight. The number of clusters used in this paper is not very large but the number of parameters is relatively large. So, the parameters are optimized to training sets. LibSVM is one of the widely known state of the art tools for support vector machines (SVM). The test set accuracy, training time and testing time (i.e., the detection time) of FCMC are compared with LibSVM by varying the size of training sets. FCMC shows a good generalization capability, though the parameters are optimized to training sets. When the number of training samples is increased by 10 times, the training time of FCMC increases by 10 times, but that of LibSVM increases by a factor of 100. The testing time is also much shorter than LibSVM when the size of the training set is large.

Full Text