An efficient approximation scheme for data mining tasks

G Kollios,D Gunupulos,S Berchtold,N Koudas

doi:10.1109/icde.2001.914858

Abstract

We investigate the use of biased sampling according to the density of the dataset, to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional datasets. In density biased sampling, the probability that a given point will be included in the sample depends on the local density of the dataset. We propose a general technique for density-biased sampling that can factor in user requirements to sample for properties of interest, and can be tuned for specific data mining tasks. This allows great flexibility and improved accuracy of the results over simple random sampling. We describe our approach in detail, we analytically evaluate it, and show how it can be optimized for approximate clustering and outlier detection. Finally we present a thorough experimental evaluation of the proposed method, applying density-biased sampling on real and synthetic data sets, and employing clustering and outlier detection algorithms, thus highlighting the utility of our approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An efficient approximation scheme for data mining tasks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Efficient biased sampling for approximate clustering and outlier detection in large data sets
G Kollios ... N Koudas
IEEE Transactions on Knowledge and Data Engineering | VOL. 15
G Kollios, et. al.G Kollios ... N Koudas
01 Sep 2003
IEEE Transactions on Knowledge and Data Engineering | VOL. 15

Enhancing Outlier Detection by an Outlier Indicator
Xiaqiong Li ... Xia Li Wang
-
Xiaqiong Li, et. al.Xiaqiong Li ... Xia Li Wang
01 Jan 2018
01 Jan 2018

Outlier Detection Based on Cluster Outlier Factor and Mutual Density
Zhongping Zhang ... Jingyang Qiu
-
Zhongping Zhang, et. al.Zhongping Zhang ... Jingyang Qiu
01 Jan 2019
01 Jan 2019

Density Biased Sampling with Locality Sensitive Hashing for Outlier Detection
Xuyun Zhang ... Qiang He
-
Xuyun Zhang, et. al.Xuyun Zhang ... Qiang He
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An efficient approximation scheme for data mining tasks

Abstract

Talk to us

Similar Papers