Abstract

Clustering is unsupervised learning where ideally class levels and number of clusters (K) are not known. K-clustering can be categorized as semi-supervised learning where K is known. Here we have considered K-Clustering with simultaneous feature selection. Feature subset selection helps to identify relevant features for clustering, increase understandability, better scalability and improve accuracy. Here we have used two measures, intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) for clustering. Measures are using mod distance per feature suitable for categorical features (attributes). Rather than combining H and S to frame the problem as single objective optimization problem, we use multi objective genetic algorithm (MOGA) to find out diverse solutions near to Pareto optimal front in the two-dimensional objective space. Each evolved solution represents a set of cluster modes (CMs) build by selected feature subset. Here, K-modes is hybridized with MOGA. We have used hybridized GA to combine global searching powers of GA with local searching powers of K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”. The main contribution of this paper is simultaneous dimensionality reduction and optimization of objectives using MOGA. Results on 3 benchmark data sets from UCI Machine Learning Repository containing categorical features shows the superiority of the algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.