Abstract
Outlier detection is one of the most important data mining techniques. It has broad applications like fraud detection, credit approval, computer network intrusion detection, anti-money laundering, etc. The basis of outlier detection is to identify data points which are “different” or “far away” from the rest of the data points in the given dataset. Traditional outlier detection method is based on statistical analysis. However, this traditional method has an inherent drawback—it requires the availability of the entire dataset. In practice, especially in the real time data feed application, it is not so realistic to wait for all the data because fresh data are streaming in very quickly. Outlier detection is hence done in batches. However two drawbacks may arise: relatively long processing time because of the massive size, and the result may be outdated soon between successive updates. In this paper, we propose several novel incremental methods to process the real time data effectively for outlier detection. For the experiment, we test three types of mechanisms for analyzing the dataset, namely Global Analysis, Cumulative Analysis and Lightweight Analysis with Sliding Window. The experiment dataset is “household power consumption” which is a popular benchmarking data for Massive Online Analysis.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.