Abstract

Clustering is one of the relevant data mining tasks, which aims to process data sets in an effective way. This paper introduces a new clustering heuristic combining the E-transitive heuristic adapted to quantitative data and the k-means algorithm with the goal of ensuring the optimal number of clusters and the suitable initial cluster centres for k-means. The suggested heuris-tic, called PFK-means, is a parameter-free clustering algorithm since it does not require the prior initialization of the number of clusters. Thus, it generates progressively the initial cluster centres until the appropriate number of clusters is automatically detected. Moreover, this paper exposes a thorough comparison between the PFK-means heuristic, its diverse variants, the E-Transitive heuristic for clustering quantitative data and the traditional k-means in terms of the sum of squared errors and accuracy using different data sets. The experiments results reveal that, in general, the proposed heuristic and its variants provide the appropriate number of clusters for different real-world data sets and give good clusters quality related to the traditional k-means. Furthermore, the experiments conducted on synthetic data sets report the performance of this heuristic in terms of processing time.

Highlights

  • In the last few years, the digital world has been facing rapid and unprecedented global evolutions due to the emergence of various concepts such as the development of the connected objects market, known as the internet of things, the continued growth of social networks, the strong use of the large ecommerce sites, as well as other factors

  • An overall comparison was established between the parameter-free clustering algorithm based k-means, its diverse variants, the E-transitive heuristic [2] adapted to quantitative data, the iterative k-means minusplus [5] and the traditional k-means [3] [4] in terms of the sum of squared errors and accuracy measures using different UCI data set [6]

  • The proposed heuristic is a parameter-free clustering algorithm, named PFK-means, combining the E-transitive heuristic [2] adapted to quantitative data and the traditional k-means [3][4]

Read more

Summary

Introduction

In the last few years, the digital world has been facing rapid and unprecedented global evolutions due to the emergence of various concepts such as the development of the connected objects market, known as the internet of things, the continued growth of social networks, the strong use of the large ecommerce sites, as well as other factors. This digital explosion presents a serious challenge for researchers to find appropriate techniques and efficient algorithms to analyze and process the considerable amount of data arising from those sources, and extract relevant information and facilitate decision-making. Different clustering algorithms can make different clustering results for the same data set

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call