Abstract

In many practical applications, it is crucial to perform automatic data clustering without knowing the number of clusters in advance. The evolutionary computation paradigm is good at dealing with this task, but the existing algorithms encounter several deficiencies, such as the encoding redundancy and the cross-dimension learning error. In this article, we propose a novel elastic differential evolution algorithm to solve automatic data clustering. Unlike traditional methods, the proposed algorithm considers each clustering layout as a whole and adapts the cluster number and cluster centroids inherently through the variable-length encoding and the evolution operators. The encoding scheme contains no redundancy. To enable the individuals of different lengths to exchange information properly, we develop a subspace crossover and a two-phase mutation operator. The operators employ the basic method of differential evolution and, in addition, they consider the spatial information of cluster layouts to generate offspring solutions. Particularly, each dimension of the parameter vector interacts with its correlated dimensions, which not only adapts the cluster number but also avoids the cross-dimension learning error. The experimental results show that our algorithm outperforms the state-of-the-art algorithms that it is able to identify the correct number of clusters and obtain a good cluster validation value.

Highlights

  • D ATA clustering plays a crucial role in various fields, such as industrial informatics [1], [2]; bioinformatics [3]–[6]; and pattern recognition [7]–[9]

  • We focus on partitioning the clustering which organizes the data objects into a number of exclusive clusters

  • 1) E-DE adopts an elastic encoding scheme where the population consists of variable-length parameter vectors, each denotes a different number of clusters

Read more

Summary

INTRODUCTION

D ATA clustering plays a crucial role in various fields, such as industrial informatics [1], [2]; bioinformatics [3]–[6]; and pattern recognition [7]–[9]. For automatic data clustering, many EC algorithms with a fixed-length structure encode the coordinates of all possible cluster centroids, while using a switch vector to represent the state (activated or not) of each cluster centroid. The existing differential evolution algorithms use a fixed-length encoding scheme for automatic data clustering, but apply some auxiliary space to represent the activate state of the cluster centroids. The existence of the cross-dimension learning error disorders the interactions between individuals and, reduces the performance of a population-based optimization approach.

Related Work
Validation Indices for Clustering
PROPOSED ALGORITHM
Elastic Encoding Scheme
Population Initialization
Mutation
Crossover
Selection
Experimental Setup
Experimental Results
Analysis of Time Complexity
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call