Abstract

Outlier detection in data streams is considered a significant task in data mining that targets the discovery of elements in an unprecedented data arrival rate. The fast arrival of data demands fast computation within the shortest period, and with minimal memory usage. Detecting distance-based outliers in such a scenario are more complicated. Existing techniques such as the two best-known methods - Micro-Cluster Outlier Detection (MCOD) and Thresh_LEAP have presented some solutions to these challenges. However, the combination of the strength of both techniques can be a lot more improvement to the individual methods proposed. Therefore, in this paper, we propose a method called Micro-Cluster with Minimal Probing (MCMP), which is a hybrid approach of the combination of the strength of MCOD and Thresh_LEAP. We offer a new distance-based outlier detection technique to minimize the computational cost in detecting distance-based outliers effectively. The proposed MCMP technique is comprised of two approaches. Firstly, we adopt micro-clusters to mitigate the range query search. Then, to deal with the objects outside the micro-clusters, we propose the concept of differentiating between strong and trivial inliers. The proposed method improves the computational speed and memory consumption, while simultaneously maintaining the outlier detection accuracy. Our experiments are conducted on both real-world and synthetic data sets. We varied the window size <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$(w)$ </tex-math></inline-formula> , neighbor count threshold <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$(k)$ </tex-math></inline-formula> and distance threshold <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$(R)$ </tex-math></inline-formula> , and observed that our method outperforms the state-of-the-art methods in both CPU time and memory consumption in the majority of the datasets.

Highlights

  • The process of detecting data points that do not conform to expected normal behaviors(outliers), is a progressively significant domain in many fields [1]–[3]

  • We propose a new solution to solve the problem of distance-based outlier detection by implementing a new technique called Micro-Cluster

  • We propose a method that applies effective minimal probing on data points outside the micro-clusters (PD) [24] by introducing the concept of strong and trivial outliers

Read more

Summary

Introduction

The process of detecting data points that do not conform to expected normal behaviors(outliers), is a progressively significant domain in many fields [1]–[3]. It has attracted more attention, especially in the data mining community. The traditional approach in previous years for detecting outliers was mainly focused on batch processing, where data was readily available [4], [5]. Most data are viewed as dynamic, and as fast incoming data streams [8] The processing of these data streams has become a significant area of study, especially in the area of detecting abnormal behavior or unusual data points

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.