RODA: A Fast Outlier Detection Algorithm Supporting Multi-Queries

Qian Ma,Jiafan Li,Mei Bai,Xite Wang

doi:10.1109/access.2021.3058660

Abstract

Outlier detection is an important task in the field of big data analysis. The technology has been extensively used in network security, sensor data analysis, public health and so on. In an outlier detection system, with the continuous expansion of upper-layer applications, a system needs to process a large number of query requests in a very short time, which places high requirements on the timeliness of outlier detection algorithms. To solve this problem, in this paper, an efficient algorithm, R-tree based Outlier Detection Algorithm (RODA), is proposed, which can effectively support single query and multiple query processing. For single query processing, we first extended the R-tree index and proposed a new outlier estimation method. Using the techniques above, the algorithm greatly reduces the retrieval space by preferentially scanning data points with high outlier-degrees. For multiple query processing, the algorithm deeply analyzes the sharing mechanism between multiple queries in order to handle multiple detection tasks within one processing. Finally, experiment results show that the RODA proposed in this paper has improved operating efficiency, and has good applicability and practical significance.

Highlights

With the rapid development of the Internet and the widespread use of embedded devices such as sensors, data of unprecedented scale has been collected for people
R-tree based Outlier Detection Algorithm (RODA) ALGORITHM DESCRIPTION The RODA algorithm proposed in this paper is an outlier detection algorithm based on R-tree and supports multiple queries
We preferentially select R-tree nodes with a lower density and data points with larger outlier degree in the node, so that a good threshold value can be obtained for filtering operations, which greatly reduces the amount of calculation and improves the efficiency of outlier detection

Summary

INTRODUCTION

With the rapid development of the Internet and the widespread use of embedded devices such as sensors, data of unprecedented scale has been collected for people. Most of existing outlier detection algorithms are designed for single query processing and do not consider the sharing mechanism between multiple queries. They can only process the detection tasks one by one. On the basis of effectively supporting a single query, a sharing mechanism between multiple query tasks can be realized, which greatly reduces memory waste and further improves the efficiency of outlier detection for multi-query processing. (3) By using real datasets and synthetic datasets, it is verified that the RODA algorithm proposed in this paper can accelerate the detection efficiency of single outlier query, and effectively support multiple outlier detection queries, and it has good practical significance.

RELATED WORK

OUTLIER DETECTION FOR MULTIPLE QUERY

RODA ALGORITHM DESCRIPTION

IF e is a non-leaf node THEN

EXPERIMENT ANALYSIS

CONCLUSION