TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams.

Jen-Wei Huang,Meng-Xun Zhong,Bijay Prasad Jaysawal

doi:10.3390/s20205829

Jen-Wei Huang, Meng-Xun Zhong + Show 1 more

Open Access

https://doi.org/10.3390/s20205829

Copy DOI

Journal: Sensors (Basel, Switzerland)	Publication Date: Oct 15, 2020
Citations: 12	License type: CC BY 4.0

Affiliation: National Cheng Kung University

Abstract

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.

Highlights

The expansion of Internet of Things is increasing the importance of outlier detection in streaming data
We evaluated MiLOF, DILOF, and time-aware and density-summarizing incremental LOF (TADILOF) in terms of AUC and execution time on various datasets
We set K at 8, which was used in DILOF [6]

Summary

Introduction

The expansion of Internet of Things is increasing the importance of outlier detection in streaming data. The local outlier factor, LOF, proposed in [3], is a well-known density-based algorithm for the detection of local outliers in static data. LOF measures the local deviation of data points with respect to their K nearest neighbors, where K is a user-defined parameter This kind of method can be useful in several applications, such as detecting fraudulent transactions, intrusion detection, direct marketing, and medical diagnostics. To handle the data streams, the algorithms utilize a fixed window size to limit the number of data points held in memory by summarizing previous data points These recent studies base their summaries only on the distribution of previous data; i.e., they do not take the sequence of data into account. The authors discuss the approach for parameter reduction for density-based clustering. The authors utilize several machine learning approaches and outlier detection for different preprocessing tasks

Objectives

Methods

Results

Conclusion