Nature-inspired metaheuristic techniques for automatic clustering: a survey and performance study

Absalom E Ezugwu

doi:10.1007/s42452-020-2073-0

Abstract

The application of several swarm intelligence and evolutionary metaheuristic algorithms in data clustering problems has in the past few decades gained wide popularity and acceptance due to their success in solving and finding good quality solutions to a variety of complex real-world optimization problems. Clustering is considered one of the most important data analysis techniques in the domain of data mining. A clustering problem refers to the partitioning of unlabeled data objects into a certain number of clusters based on their attribute values or features, with the objective of maximizing intra-clusters homogeneity and inter-cluster heterogeneity. This paper presents an up-to-date survey of major nature-inspired metaheuristic algorithms that have been employed to solve automatic clustering problems. Further, a comparative study of several modified well-known global metaheuristic algorithms is carried out to solve automatic clustering problems. Also, three hybrid swarm intelligence and evolutionary algorithms, namely, particle swarm differential evolution algorithm, firefly differential evolution algorithm and invasive weed optimization differential evolution algorithm, are proposed to deal with the task of automatic data clustering. In contrast to many of the existing traditional and evolutionary computational clustering techniques, the clustering algorithms presented in this paper do not require any predetermined information or prior-knowledge of the dataset that is to be classified, but rather they are capable of spontaneously identifying the optimal number of partitions of the data points during the course of program execution. Forty-one benchmarked datasets that comprise eleven artificial and thirty real world datasets are collated and utilized to evaluate the performances of the representative nature-inspired clustering algorithms. According to the extensive experimental results, comparisons and statistical significance, the firefly algorithm appeared to be more appropriate for better clustering of both low and high dimensional data objects than were other state-of-the-art algorithms. Further, an experimental study demonstrates the superiority of the three proposed hybrid algorithms over the standard state-of-the-art methods in finding meaningful clustering solutions to the problem at hand.

Full Text