Abstract

In the past years, several clustering algorithms have been developed, for example, K-means, K-medoid. Most of these algorithms have the common problem of selecting the appropriate number of clusters and these algorithms are sensitive to noisy data and would cause less accurate clustering of the data set. Therefore, this paper introduces a new Hybrid Grid-based Gravitational Clustering Algorithm (HGGCA) geometrically, which can automatically detect the number of clusters of the targeted data set and find the clusters with any arbitrary forms and filter the noisy data. This proposed clustering algorithm is used to move the cluster centers to the areas where the data density is high based on Newton’s law of gravity and Newton’s laws of motion. Also, the proposed method has higher accuracy than the existing K-means and K-medoids methods which is shown in the experimental result. In this study, we used cluster-validity-indicators to verify the validity of the proposed and existing methods of clustering. Experimental results show that the proposed algorithm massively creates high-quality clusters.

Highlights

  • Clustering is arguably the most significant unsupervised learning problem

  • Clustering is a task of combining similar objects in one group and dissimilar objects in another group (Han, 2006)

  • P i 1 xi, yi where, P is the total number of data points within the ith grid: Step 3.3: Update the grid center Ci of each grid by Newton’s law of gravity and Newton’s law of motion

Read more

Summary

01. Introduction

Clustering is arguably the most significant unsupervised learning problem. Clustering is a task of combining similar objects in one group and dissimilar objects in another group (Han, 2006). Finding similarities between data according to their characteristics can be done by cluster analysis. A few of them are the partitioning method, hierarchical method and density-based method, Grid-based method, Gravitational clustering method (Thammano and Sangkapas, 2011; Gomez et al, 2003), Model-based method, Constrainedbased method (Jain et al, 1999). In existing k-means and k-medoids methods, the determination of the value of k (number of clusters) is required before clustering is a difficult task. Our focus was on grid-based and gravitational clustering methods.

02. Existing Method
03. Proposed Method
Synthetic 2D data set
07. Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call