Comparison of outlier detection techniques in non-stationary time series data

Sampson Twumasi-Ankrah,Jonathan Kwaku Afriyie,Wilhemina Adoma Pels,Danielson Nartey,Doris Arthur,Simon Kojo Appiah

doi:10.4314/gjpas.v27i1.7

Abstract

This study examined the performance of six outlier detection techniques using a non-stationary time series dataset. Two key issues were of interest. Scenario one was the method that could correctly detect the number of outliers introduced into the dataset whiles scenario two was to find the technique that would over detect the number of outliers introduced into the dataset, when a dataset contains only extreme maxima values, extreme minima values or both. Air passenger dataset was used with different outliers or extreme values ranging from 1 to 10 and 40. The six outlier detection techniques used in this study were Mahalanobis distance, depth-based, robust kernel-based outlier factor (RKOF), generalized dispersion, Kth nearest neighbors distance (KNND), and principal component (PC) methods. When detecting extreme maxima, the Mahalanobis and the principal component methods performed better in correctly detecting outliers in the dataset. Also, the Mahalanobis method could identify more outliers than the others, making it the "best" method for the extreme minima category. The kth nearest neighbor distance method was the "best" method for not over-detecting the number of outliers for extreme minima. However, the Mahalanobis distance and the principal component methods were the "best" performed methods for not over-detecting the number of outliers for the extreme maxima category. Therefore, the Mahalanobis outlier detection technique is recommended for detecting outlier in nonstationary time series data.

Highlights

There are two notable definitions of an outlier in literature
The two issues of interest are the correct and compares it with those of the neighbors of each detection of the number of outliers introduced into a nonparticipant of that Kth nearest neighbors distance (KNND) set
The local reachability density (LRD) of an object p is number of outliers introduced into the dataset

Summary

Introduction

There are two notable definitions of an outlier in literature. According to Barnett and Lewis (1994), an outlier is an observation that appears to deviate from observations of the sample in which it occurs. Johnson (1992) defines an outlier as an observation in a dataset that appears inconsistent with the rest of the observations in that dataset. Outlier detection refers to the task of identifying patterns in data that do not conform to expected behaviors (Ané et al, 2008; Angiulli and Pizzuti, 2002). Outlier detection is widely applied in public health anomaly, credit card fraud, intrusion detection studies, and has become of great interest to the data mining area

Methods

Results

Conclusion