Abstract
Detecting outliers in data streams is a challenging problem since, in a data stream scenario, scanning the data multiple times is unfeasible, and the incoming streaming data keep evolving. Over the years, a common approach to outlier detection is using clustering-based methods, but these methods have inherent challenges and drawbacks. These include to effectively cluster sparse data points which has to do with the quality of clustering methods, dealing with continuous fast-incoming data streams, high memory and time consumption, and lack of high outlier detection accuracy. This paper aims at proposing an effective clustering-based approach to detect outliers in evolving data streams. We propose a new method called Effective Microcluster and Minimal pruning CLustering-based method for Outlier detection in Data Streams (EMM-CLODS). It is a clustering-based outlier detection approach that detects outliers in evolving data streams by first applying microclustering technique to cluster dense data points and effectively handle objects within a sliding window according to the relevance of their status to their respective neighbors or position. The analysis from our experimental studies on both synthetic and real-world datasets shows that the technique performs well with minimal memory and time consumption when compared to the other baseline algorithms, making it a very promising technique in dealing with outlier detection problems in data streams.
Highlights
In the current era, the need to detect abnormal behavior to reveal salient facts, observations, and realizing accurate predictions of data is extremely significant
We propose a new microclustering and minimal pruning clustering-based unsupervised outlier detection scheme to detect outliers in data streams while simultaneously addressing the mentioned challenges. e proposed approach involves different stages to adapt to the dynamic changes of data distribution that aims at eliminating the limitations of previously proposed methods. e newly propose method is called Effective Microcluster and Minimal pruning CLustering-based method for Outlier detection in Data Streams (EMM-CLODS), which is a clustering-based outlier detection approach
E second datasets adopted for our experiment are the tropical atmospheric ocean project (TAO) datasets [32, 33], which is a low-dimensional dataset with three attributes and 575, 648 records. e dataset is real-time data extracted from National Oceanic and Atmospheric Administration website [33]
Summary
The need to detect abnormal behavior to reveal salient facts, observations, and realizing accurate predictions of data is extremely significant. Among the different categories of proposed outlier detection methods, clusteringbased approaches have shown to be popular in static data but yet one of the most challenging to adopt for outlier detection tasks in data streams They have shown to be efficient for some outlier detection tasks, they lead to low Complexity computational cost and high scalability in high-dimensional data [5, 12]. E process of clustering and detecting outliers in data streams is complicating since the clustering techniques often involve several parameters and operate in low- and high-dimensional spaces, constrained with excessive distance-based computation of object neighbors, noise, and so on. For this reason, clustering-based approach has varying performance for different application domains and data types.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.