An Effective Minimal Probing Approach With Micro-Cluster for Distance-Based Outlier Detection in Data Streams

Mohamed Jaward Bah,Hongzhi Wang,Furkh Zeshan,Hanan Aljuaid,Mohamed Hammad

doi:10.1109/access.2019.2946966

Abstract

Outlier detection in data streams is considered a significant task in data mining that targets the discovery of elements in an unprecedented data arrival rate. The fast arrival of data demands fast computation within the shortest period, and with minimal memory usage. Detecting distance-based outliers in such a scenario are more complicated. Existing techniques such as the two best-known methods - Micro-Cluster Outlier Detection (MCOD) and Thresh_LEAP have presented some solutions to these challenges. However, the combination of the strength of both techniques can be a lot more improvement to the individual methods proposed. Therefore, in this paper, we propose a method called Micro-Cluster with Minimal Probing (MCMP), which is a hybrid approach of the combination of the strength of MCOD and Thresh_LEAP. We offer a new distance-based outlier detection technique to minimize the computational cost in detecting distance-based outliers effectively. The proposed MCMP technique is comprised of two approaches. Firstly, we adopt micro-clusters to mitigate the range query search. Then, to deal with the objects outside the micro-clusters, we propose the concept of differentiating between strong and trivial inliers. The proposed method improves the computational speed and memory consumption, while simultaneously maintaining the outlier detection accuracy. Our experiments are conducted on both real-world and synthetic data sets. We varied the window size <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$(w)$ </tex-math></inline-formula> , neighbor count threshold <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$(k)$ </tex-math></inline-formula> and distance threshold <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$(R)$ </tex-math></inline-formula> , and observed that our method outperforms the state-of-the-art methods in both CPU time and memory consumption in the majority of the datasets.

Highlights

The process of detecting data points that do not conform to expected normal behaviors(outliers), is a progressively significant domain in many fields [1]–[3]
We propose a new solution to solve the problem of distance-based outlier detection by implementing a new technique called Micro-Cluster
We propose a method that applies effective minimal probing on data points outside the micro-clusters (PD) [24] by introducing the concept of strong and trivial outliers

Summary

Introduction

The process of detecting data points that do not conform to expected normal behaviors(outliers), is a progressively significant domain in many fields [1]–[3]. It has attracted more attention, especially in the data mining community. The traditional approach in previous years for detecting outliers was mainly focused on batch processing, where data was readily available [4], [5]. Most data are viewed as dynamic, and as fast incoming data streams [8] The processing of these data streams has become a significant area of study, especially in the area of detecting abnormal behavior or unusual data points

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Effective Minimal Probing Approach With Micro-Cluster for Distance-Based Outlier Detection in Data Streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Outlier Detection in Non-stationary Data Streams
Luan Tran ... Cyrus Shahabi
-
Luan Tran, et. al.Luan Tran ... Cyrus Shahabi
23 Jul 2019
23 Jul 2019

Fast Distance-based Outlier Detection in Data Streams based on Micro-clusters
Luan Tran ... Liyue Fan
-
Luan Tran, et. al.Luan Tran ... Liyue Fan
01 Jan 2019
01 Jan 2019

Distance-based outlier detection in data streams
Luan Tran ... Cyrus Shahabi
Proceedings of the VLDB Endowment | VOL. 9
Luan Tran, et. al.Luan Tran ... Cyrus Shahabi
01 Aug 2016
Proceedings of the VLDB Endowment | VOL. 9

Fast Memory Efficient Local Outlier Detection in Data Streams
Mahsa Salehi ... James C Bezdek
IEEE Transactions on Knowledge and Data Engineering | VOL. 28
Mahsa Salehi, et. al.Mahsa Salehi ... James C Bezdek
01 Dec 2016
IEEE Transactions on Knowledge and Data Engineering | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Effective Minimal Probing Approach With Micro-Cluster for Distance-Based Outlier Detection in Data Streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access