Abstract

In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering techniques are used to overcome the problems associated with unsupervised and supervised classification. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. Here, a newly developed multiobjective simulated annealing based optimization technique named archived multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level information are known to us. Four objective functions are used, first two objective functions represent, respectively, total symmetry present in the clusters and total compactness of the partitioning results. These are based on point symmetry and euclidean distance computations. Third objective function is an external cluster validity index which measures the similarity of the clustering obtained on labeled data with the original labeling, and fourth one counts number of features. Our objective is to optimize values of cluster validity indices where as to increase the number of features in order to remove the bias of internal cluster validity indices on lower dimensions. AMOSA is utilized to detect the appropriate subset of features, actual number of clusters as well as the true partitioning. For the purpose of assignment of data points to respective clusters, a point symmetry distance based new innovative methodology has been adopted. Mutation changes the feature combinations as well as the set of cluster centers. So in this paper, we have implemented a novel method to select a single solution from the Pareto-optimal front. So, the proposed Semi-FeaClustMOO technique ensures to obtain the actual number of clusters as well as the true partitioning result. The efficacy of the proposed Semi-FeaClustMOO technique is shown on three real-life data sets, and compared with genetic algorithm based VGAPS clustering technique and K-mean clustering technique. These Clustering techniques work with all the available features of data sets and Semi-FeaClustMOO technique uses a subset of features during the computation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call