A multilevel genetic algorithm for the clustering problem

Noureddine Bouhmala

doi:10.1504/ijict.2016.077692

Abstract

Data mining is concerned with the discovery of interesting patterns and knowledge in data repositories. Cluster analysis which belongs to the core methods of data mining is the process of discovering homogeneous groups called clusters. Given a dataset and some measure of similarity between data objects, the goal in most clustering algorithms is maximising both the homogeneity within each cluster and the heterogeneity between different clusters. The multilevel paradigm suggests a hierarchical optimisation process going through different levels evolving from a coarse grain to fine grain strategy. The clustering problem is solved by first reducing the problem level by level to a coarser problem where an initial clustering is computed. The clustering of the coarser problem is mapped back level-by-level to obtain a better clustering of the original problem by refining the intermediate different clustering obtained at various levels. In this paper, a multilevel genetic algorithm and a multilevel K-means algorithm are introduced for solving the clustering problem. A benchmark using a number of datasets collected from a variety of domains is used to compare the effectiveness of the hierarchical approach against its single-level counterpart.

Full Text