Mining outliers with faster cutoff update and space utilization

Chi-Cheong Szeto,Edward Hung

doi:10.1016/j.patrec.2010.04.002

Abstract

It is desirable to find unusual data objects by Ramaswamy et al.’s distance-based outlier definition, because only a metric distance function between two objects is required. This definition does not need any neighborhood distance threshold required by many existing algorithms based on the definition of Knorr and Ng. Bay and Schwabacher proposed an efficient algorithm ORCA, which can give near linear time performance, for this task. To further reduce the running time, we propose in this paper two algorithms RC and RS using the following two techniques, respectively: (i) faster cutoff update, and (ii) space utilization after pruning. We tested RC, RS, and RCS (a hybrid approach combining both RC and RS) on several large and high-dimensional real data sets with millions of objects. The experiments show that the speed of RCS is as fast as 1.4–2.3 times that of ORCA, and the improvement of RCS is relatively insensitive to the increase in the data size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mining outliers with faster cutoff update and space utilization

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters

Lead the way for us

Journal: Pattern Recognition Letters	Publication Date: Apr 10, 2010
Citations: 5

Similar Papers

Mining Outliers with Faster Cutoff Update and Space Utilization
Chi-Cheong Szeto ... Edward Hung
-
Chi-Cheong Szeto, et. al.Chi-Cheong Szeto ... Edward Hung
01 Jan 2009
01 Jan 2009

An efficient random forests algorithm for high dimensional data classification
Qiang Wang ... Thanh-Tung Nguyen
Advances in Data Analysis and Classification | VOL. 12
Qiang Wang, et. al.Qiang Wang ... Thanh-Tung Nguyen
21 Mar 2018
Advances in Data Analysis and Classification | VOL. 12

Research on Fuzzy Clustering Algorithms for Large Dimensional Data Sets Under Cloud Computing
Shuang-Cheng Jia ...
-
Shuang-Cheng Jia, et. al.Shuang-Cheng Jia ...
01 Jan 2020
01 Jan 2020

Mining distance-based outliers in near linear time with randomization and a simple pruning rule
Stephen D Bay ... Mark Schwabacher
-
Stephen D Bay, et. al.Stephen D Bay ... Mark Schwabacher
24 Aug 2003
24 Aug 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mining outliers with faster cutoff update and space utilization

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition Letters