Abstract

Outlier detection has attracted a wide range of attention for its broad applications, such as fault diagnosis and intrusion detection, among which the outlier analysis in data streams with high uncertainty and infinity is more challenging. Recent major work of outlier detection has focused on principle research of the local outlier factor, and there are few studies on incremental updating strategies, which are vital to outlier detection in data streams. In this paper, a novel incremental local outlier detection approach is introduced to dynamically evaluate the local outlier in the data stream. An extended local neighborhood consisting of k nearest neighbors, reverse nearest neighbors and shared nearest neighbors is estimated for each data. The theoretical evidence of algorithm complexity for the insertion of new data and deletion of old data in the composite neighborhood shows that the amount of affected data in the incremental calculation is finite. Finally, experiments performed on both synthetic and real datasets verify its scalability and outlier detection accuracy. All results show that the proposed approach has comparable performance with state-of-the-art k nearest neighbor-based methods.

Highlights

  • Our world creates a huge amount of data, and the amount of new information will continue to increase at an explosive growth trend in the foreseeable future, which has overtaken storage and processing capabilities

  • The local outlier factor (LOF) strategy was firstly proposed in [20], where the local reachability density calculated in the k-nearest neighbor (kNN) of data was used to indicate outlierness

  • The proposed CNN-based local outlier factor (CLOF) method was designed to detect outlier in data streams, where the varying sliding window width, k-nearest neighbor and data dimension were the main challenges for detection accuracy and efficiency

Read more

Summary

Introduction

Our world creates a huge amount of data, and the amount of new information will continue to increase at an explosive growth trend in the foreseeable future, which has overtaken storage and processing capabilities. A considerable portion of these data are generated continuously as data streams from different applications, for example, structural health monitoring, fault detection in industry, and invasion and fraud detection for Internet data. As an important research direction in the field of data stream mining, outlier or anomaly detection usually involves the discovery of observations that deviate so much from other observations as to arouse suspicions that they were generated by a different mechanism [2]. The data stream has dynamic changes and infinite data volumes, and may have multiple data dimensions and large amounts of data traffic, which makes outlier detection in data streams a tricky challenge, and a promising research direction, especially for applications with limited computing capabilities, storage space, and energy [3,4]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.