Abstract

New emerging high-dimensional data sets have made traditional clustering algorithms increasingly inefficient. More sophisticated approaches are required to cope with the increasing dimensionality and cardinality of such data sets. Feature selection methods are proposed as a solution to deal with this problem, however they fail for data sets where the attribute support for different clusters is not the same. For this category of data sets subspace clustering algorithms have been introduced over the past decade. We approach this problem from the perspective of Genetic Algorithms by adopting a hierarchical data structure deployed in three stages. 1) a traditional clustering algorithm is applied independently to each attribute of the data set, thus defining a grid of potential 1-d cluster centroids. 2) representing multi-dimensional cluster centroids by indexing 1-d cluster centroids. 3) converting the problem of finding the best combination of cluster centroids into that of discrete optimization and applying a multi-objective evolutionary algorithm, which uses group fitness evaluation to give a fitness to a group of clusters, as defined by process 2. Synthetic data sets with different characteristics are generated as the ground truth to evaluate the resulting algorithm for Evolutionary Subspace Clustering (ESC) as well as benchmark against alternative subspace and full-space clustering algorithms. ESC returns competitive accuracy and while typically utilizing less attributes and scaling as attribute count increases.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call