An Efficient Density Biased Sampling Algorithm for Clustering Large High-Dimensional Datasets

Qian Xia ,Jie Deng,Heng Qian,Qin Wu

doi:10.1142/s0218001415500263

Abstract

As one of the most popular data reduction category for large scale data mining, simple random sampling (SRS) often leads to the loss of small clusters when dealing with unevenly distributed datasets. A density biased sampling algorithm based on grid can avoid the problem. However, the grid division granularity has an influence on the efficiency and effectiveness of the algorithm. To overcome the drawback, a variable grid density biased sampling is proposed to deal with large scale unevenly distributed datasets. However, the efficiency is restricted by dimensionality. Aiming at this, an efficient density biased sampling algorithm is proposed for large high-dimensional datasets. Firstly, an efficient feature selection method is designed to obtain the feature subsets. Secondly, the variable grid division is executed in the selected feature subsets. Finally, the sample is obtained from the grid space. Synthetic datasets and UCI datasets, tested in our experiments, reveal that the proposed algorithm can achieve higher quality than SRS. Meanwhile, the proposed algorithm consumes less sampling time comparing with density biased sampling algorithm based on grid and density biased sampling algorithm based on variable grid division.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Efficient Density Biased Sampling Algorithm for Clustering Large High-Dimensional Datasets

Abstract

Talk to us

Similar Papers

More From: International Journal of Pattern Recognition and Artificial Intelligence

Lead the way for us

Journal: International Journal of Pattern Recognition and Artificial Intelligence	Publication Date: Nov 22, 2015
Citations: 3

Similar Papers

Efficient biased sampling for approximate clustering and outlier detection in large data sets
G Kollios ... N Koudas
IEEE Transactions on Knowledge and Data Engineering | VOL. 15
G Kollios, et. al.G Kollios ... N Koudas
01 Sep 2003
IEEE Transactions on Knowledge and Data Engineering | VOL. 15

Density biased sampling algorithm based on variable grid division
Kaiyuan Sheng ... Xuezhong Qian
Journal of Computer Applications | VOL. 33
Kaiyuan Sheng, et. al.Kaiyuan Sheng ... Xuezhong Qian
08 Nov 2013
Journal of Computer Applications | VOL. 33

Density biased sampling
Christopher R. Palmer ... Christos Faloutsos
ACM SIGMOD Record | VOL. 29
Christopher R. Palmer, et. al.Christopher R. Palmer ... Christos Faloutsos
16 May 2000
ACM SIGMOD Record | VOL. 29

An efficient approximation scheme for data mining tasks
G Kollios ... N Koudas
-
G Kollios, et. al.G Kollios ... N Koudas
02 Apr 2001
02 Apr 2001

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Efficient Density Biased Sampling Algorithm for Clustering Large High-Dimensional Datasets

Abstract

Talk to us

Similar Papers

More From: International Journal of Pattern Recognition and Artificial Intelligence