A Vibration Method for Discovering Density Varied Clusters

Mohammad T Elbatta,Wesam M Ashour,Raed M Bolbol

doi:10.5402/2012/723516

Mohammad T Elbatta, Wesam M Ashour + Show 1 more

Open Access

https://doi.org/10.5402/2012/723516

Copy DOI

Journal: ISRN Artificial Intelligence	Publication Date: Nov 15, 2011
Citations: 27	License type: CC BY 3.0

Affiliation: Islamic University of Gaza

Abstract

DBSCAN is a base algorithm for density-based clustering. It can find out the clusters of different shapes and sizes from a large amount of data, which is containing noise and outliers. However, it is fail to handle the local density variation that exists within the cluster. Thus, a good clustering method should allow a significant density variation within the cluster because, if we go for homogeneous clustering, a large number of smaller unimportant clusters may be generated. In this paper, an enhancement of DBSCAN algorithm is proposed, which detects the clusters of different shapes and sizes that differ in local density. Our proposed method VMDBSCAN first finds out the “core” of each cluster—clusters generated after applying DBSCAN. Then, it “vibrates” points toward the cluster that has the maximum influence on these points. Therefore, our proposed method can find the correct number of clusters.

Highlights

Unsupervised clustering techniques are an important data analysis task that tries to organize the data set into separated groups with respect to a distance or, equivalently, a similarity measure [1]
EDBSCAN [26] algorithm is another extension of DBSCAN; it keeps tracks of density variation which exists within the cluster
Failing to detect the density varied clusters, there are many researches existing as an enhancement of DBSCAN for handling the density variation within the cluster

Summary

Introduction

Unsupervised clustering techniques are an important data analysis task that tries to organize the data set into separated groups with respect to a distance or, equivalently, a similarity measure [1]. Hierarchal algorithms use distance measurements between the objects and between the clusters. Grid-based algorithms quantize the object space into a finite number of cells (hyper-rectangles) and perform the required operations on the quantized space The advantage of this approach is the fast processing time that is in general independent of the number of data objects. Model-based algorithms find good approximations of model parameters that best fit the data They can be either partitional or hierarchical, depending on the structure or model they hypothesize about the data set and the way they refine this model to identify partitionings. They are closer to density-based algorithms in that they grow particular clusters so that the preconceived model is improved.

Related Work

DBSCAN Algorithm

The Proposed Algorithm

Simulation and Results

Conclusions and Future Works