Unsupervised Outlier Detection: A Meta-Learning Algorithm Based on Feature Selection

Vasilis Papastefanopoulos,Pantelis Linardatos,Sotiris Kotsiantis

doi:10.3390/electronics10182236

Vasilis Papastefanopoulos, Pantelis Linardatos + Show 1 more

Open Access

https://doi.org/10.3390/electronics10182236

Copy DOI

Journal: Electronics	Publication Date: Sep 12, 2021
Citations: 3	License type: CC BY 4.0

Affiliation: University of Patras

Abstract

Outlier detection refers to the problem of the identification and, where appropriate, the elimination of anomalous observations from data. Such anomalous observations can emerge due to a variety of reasons, including human or mechanical errors, fraudulent behaviour as well as environmental or systematic changes, occurring either naturally or purposefully. The accurate and timely detection of deviant observations allows for the early identification of potentially extensive problems, such as fraud or system failures, before they escalate. Several unsupervised outlier detection methods have been developed; however, there is no single best algorithm or family of algorithms, as typically each relies on a measure of ‘outlierness’ such as density or distance, ignoring other measures. To add to that, in an unsupervised setting, the absence of ground-truth labels makes finding a single best algorithm an impossible feat even for a single given dataset. In this study, a new meta-learning algorithm for unsupervised outlier detection is introduced in order to mitigate this problem. The proposed algorithm, in a fully unsupervised manner, attempts not only to combine the best of many worlds from the existing techniques through ensemble voting but also mitigate any undesired shortcomings by employing an unsupervised feature selection strategy in order to identify the most informative algorithms for a given dataset. The proposed methodology was evaluated extensively through experimentation, where it was benchmarked and compared against a wide range of commonly-used techniques for outlier detection. Results obtained using a variety of widely accepted datasets demonstrated its usefulness and its state-of-the-art results as it topped the Friedman ranking test for both the area under receiver operating characteristic (ROC) curve and precision metrics when averaged over five independent trials.

Highlights

Outlier detection refers to the problem of the identification and, where appropriate, the removal of anomalous observations from data
There is no official definition of what constitutes an outlier [1]; they can be broadly seen as observations that deviate enough from the majority of observations in a dataset to be considered the product of a different generative process
Proposed Methodology The aim of this study is to make a contribution to the field of unsupervised outlier detection by proposing a novel meta-learning algorithm for outlier identification

Summary

Introduction

Outlier detection refers to the problem of the identification and, where appropriate, the removal of anomalous observations from data. Numerous real-world applications rely on sophisticated data analyses to filter out outliers and maintain system reliability [3,4]. This is especially true in safety critical environments, where the presence of outliers may imply abnormal activity, such as fraud, or may indicate irregular running conditions in a system, which may hinder its performance significantly and result in system failure [5,6]. A significant part of the literature focuses on the undesired properties of outliers; they can reveal valuable information about previously unknown characteristics of the systems and entities that generated them. Shedding light on such characteristics and properties can provide interesting insights and potentially lead to important discoveries [7]

Objectives

Methods

Findings

Conclusion