A Mass-Based Approach for Local Outlier Detection

Anh Hoang,Van-Nam Huynh,Toan Nguyen Mau,Duc-Vinh Vo

doi:10.1109/access.2021.3053072

Abstract

This paper proposes a new outlier detection approach that measures the degree of outlierness for each instance in a given dataset. The proposed model utilizes a mass-based dissimilarity measure to address the weaknesses of neighbor-based outlier models while detecting local outliers in the dataset within a variety of data point densities. In particular, it first applies a hierarchical partitioning technique to generate a set of tree-like nested structure partitions for the input dataset, and then a mass-based dissimilarity measure is defined to quantify the dissimilarity between two data instances given the generated hierarchical partition structure. After that, for each data instance, a context set is obtained by gathering the neighbors around it with the $k$ lowest mass dissimilarities, and based on those context sets, a mass-based local outlier score model is introduced to compute the outlierness for each individual instance. The proposed approach fundamentally changes the perspective of the outlier model by using the mass-based measurement instead of the distance-based functions used in most neighbor-based methods. A comprehensive experiment conducted on both synthetic and real-world datasets demonstrates that the proposed approach is not only competitive with the existing state-of-the-art outlier detection models but is also an efficient and effective alternative for local outlier detection methods.

Highlights

Outlier detection is important for a wide range of scientific and industrial processes because of the higher costs associated with misunderstanding outliers than those of other events
We introduce an alternative approach for local outlier detection by merging the hierarchical partitioning technique and the mass-based dissimilarity measurement
The decision boundaries between outliers and normal instances are displayed in green except for those of the local outlier factor (LOF) and mass-based local outlier scoring approach (MLOS) because they have no prediction function to be applied on new instances

Summary

Introduction

Outlier detection is important for a wide range of scientific and industrial processes because of the higher costs associated with misunderstanding outliers than those of other events. Compared with other data mining, machine learning, and knowledge discovery problems, outlier detection is more problematic in a variety of applications, such as intrusion detection [1], [2], credit-card fraud [3]–[5], event detection [6], [7], medical diagnosis [8]–[10], law enforcement [11], and anomaly detection [12]–[14]. The associate editor coordinating the review of this manuscript and approving it for publication was Ze Ji. there have been a variety of approaches previously developed to solve this problem, including distance-based, densitybased, clustering-based, ensemble-based, and learning-based approaches. There have been a variety of approaches previously developed to solve this problem, including distance-based, densitybased, clustering-based, ensemble-based, and learning-based approaches This distinction is associated with the dissimilarity measurements among data points to determine outlier scores

Objectives

Results

Conclusion