Abstract

Although clustering is an unsupervised learning approach, most clustering algorithms require the setting of parameters (such as the number of clusters, minimum density or distance threshold) in advance to work properly. Moreover, discovering an appropriate set of clusters is a difficult task since clusters can have any shape, size and density and is harder in the presence of noise. In fact, the presence of noise can deteriorate the results of many of the clustering techniques that are based on the least squares estimate. This paper presents a data clustering algorithm that does not require a parameter setting process [the Parameter-less Randomized Gravitational Clustering algorithm (Pl-Rgc)] and combines it with a mechanism, based in micro-clusters ideas, for representing a cluster as a set of prototypes. In this way, a set of parameter estimation strategies, previously developed for the Randomized Gravitational Clustering (Rgc), are combined with a newly developed stopping criterion, based on the average number of points merged iteration by iteration, to remove the parameter setting of the Rgc algorithm. The performance of the proposed Pl-Rgc algorithm is evaluated experimentally on two types of synthetic data sets: data sets with Gaussian clusters and with non-parametric clusters and two types of real data sets: five classic machine learning classification data sets and one intrusion detection data set. Our results show that the proposed mechanism is able to deal with noise, finds the appropriated number of clusters and finds an appropriated set of cluster prototypes regardless the type of data is working on.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call