Data clustering with mixed features by multi objective genetic algorithm

Dipankar Dutta,Jaya Sil,Paramartha Dutta

doi:10.1109/his.2012.6421357

Abstract

In the paper, real coded multi objective genetic algorithm (MOGA) based K-clustering method has been studied where K represents the number of clusters known a priori. Proposed method has the capability to deal with continuous and categorical features (mixed features) of data set. Commonly means and modes of features represents clusters for continuous and categorical features respectively. For this reason, K-means and K-modes are most popular clustering algorithm for continuous and categorical features respectively. The searching power of Genetic Algorithm (GA) is exploited to search for suitable clusters and cluster centroids (means or modes) so that intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) are simultaneously optimized. It is achieved by measuring H and S using a special distance per feature metric, suitable for continuous and categorical features both. We have selected four benchmark data sets from UCI Machine Learning Repository containing continuous and categorical features both. Here, K-means and K-modes is hybridized with GA to combine global searching capabilities of GA with local searching capabilities of K-means and K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”.

Full Text