An Overview of Multiple Outliers in Multidimensional Data

T.A Sajesh,M.R Srinivasan

doi:10.4038/sljastats.v14i2.6214

Abstract

The process of detection of outliers is an interesting and important aspect in the analysis of data, as it could impact the inference. Literature is abundant with procedures for detection and testing of single outliers in sample data. However, the presence of two or more outliers in multivariate data would render the detection and testing process more complicated as majority of outliers are invisible to many of the methods. This is due to the masking effect, and regular classical and related methods being found unsuitable for use of outlier identification techniques. The difficulty of detection increases with the number of outliers and the dimension of the data because the outliers can be extreme in any growing number of directions. An overview of multivariate outlier detection methods are provided in this study because of its growing importance in a wide variety of practical situations. DOI: http://dx.doi.org/10.4038/sljastats.v14i2.6214

Highlights

Statisticians have always been interested in finding “outlying”, “unusual”, or “unrepresentative” observations for many years as a precursor to data analysis
There are numerous robust distance-based outlier detection methods evolved over the last two decades and the following are the findings presented in order
The primary theorem proved by Rousseeuw and Van Driessen states that if one starts with a half-sample of data, orders the entire data set based on Mahalanobis distances derived from the half-sample’s mean vector and covariance matrix, and selects a new half-sample from the observations with smallest distances, the covariance determinant of the new half-sample will be less than or equal to the old half-sample covariance determinant

Summary

Introduction

Statisticians have always been interested in finding “outlying”, “unusual”, or “unrepresentative” observations for many years as a precursor to data analysis. The average gestation period for a human female is 280 days, and so the question arose regarding 349 days being as a large observation or does that data point belong to another population, namely one of women who conceived much later than August 28, 1944 [1] Another example where outliers themselves are of primary importance involves air safety, as discussed in [4]. If there were other types of mosquitoes in his data collection, he would not be interested in their characteristics, he would want to remove the observations or ensure that the observations do not influence the statistical estimates of the original population In such a situation, the techniques should accommodate the outliers but need not detect and reject them in the estimation and are called robust. If the variability is due to inherent variation, the point should remain

Univariate Outliers

Multivariate Outliers

M-Estimation Method

MVE and MCD Methods

Stahel - Donoho Estimator

Hadi’s Forward Search Method

Atkinson’s Forward Search Method

Hawkins’ Feasible Solution Algorithm

Hybrid Algorithm

Smallest Half-Volume and Resampling by Half-Means Methods

Bivariate Boxplot Method

3.1.10. BACON Method

3.1.11. Kurtosis Method

3.1.12. OGK Method

3.1.13. Comedian Approach

3.1.14. Other Distance-based Methods

Non-Traditional Methods

Principal Component Methods

Mahalanobis Distance Decomposition Method

Projection Pursuit Detection

Juan-Prieto Method

Chiang-Pell-Seasholtz PCA Method

Other Non-Traditional Methods

Comparative Study

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sri Lankan Journal of Applied Statistics	Publication Date: Nov 13, 2013
Citations: 67	License type: cc-by

R Discovery Prime

R Discovery Prime

An Overview of Multiple Outliers in Multidimensional Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sri Lankan Journal of Applied Statistics

Lead the way for us

Similar Papers

Design-based estimation for geometric quantiles with application to outlier detection
Mohamed Chaouch ... Camelia Goga
Computational Statistics & Data Analysis | VOL. 54
Mohamed Chaouch, et. al.Mohamed Chaouch ... Camelia Goga
25 Mar 2010
Computational Statistics & Data Analysis | VOL. 54

A New Single Linkage Robust Clustering Outlier Detection Procedures for Multivarite Data
Sharifah Sakinah Syed Abd Mutalib ... Siti Zanariah Satari
Sains Malaysia | VOL. 52
Sharifah Sakinah Syed Abd Mutalib, et. al.Sharifah Sakinah Syed Abd Mutalib ... Siti Zanariah Satari
31 Aug 2023
Sains Malaysia | VOL. 52

High-dimensional outlier detection using random projections
P Navarro-Esteban ... J A Cuesta-Albertos
Test (Madrid, Spain) | VOL. 30
P Navarro-Esteban, et. al.P Navarro-Esteban ... J A Cuesta-Albertos
03 Feb 2021
Test (Madrid, Spain) | VOL. 30

Outlier Detection in Multivariate Hydrologic Data
Adam J Kirk ... Richard H Mccuen
Journal of hydrologic engineering | VOL. 13
Adam J Kirk, et. al.Adam J Kirk ... Richard H Mccuen
01 Jul 2008
Journal of hydrologic engineering | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Overview of Multiple Outliers in Multidimensional Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sri Lankan Journal of Applied Statistics