Random Subspace Learning Approach to High-Dimensional Outliers Detection

Bohan Liu,Ernest Fokoué

doi:10.4236/ojs.2015.56063

Bohan Liu, Ernest Fokoué

Open Access

https://doi.org/10.4236/ojs.2015.56063

Copy DOI

Journal: Open Journal of Statistics	Publication Date: Jan 1, 2015
Citations: 17	License type: CC BY 4.0

Affiliation: Rochester Institute of Technology

Abstract

We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.

Highlights

⊂ 1× p, under the special scenario in which n p refers to as high dimensional low sample size (HDLSS) setting
The Random Subspace Learning (RSSL) outlier detection algorithm computes a determinant of covariance for each subsample, with each subsample residing in a subspace spanned by the d randomly selected variables, where d is usually selected to be
We have presented what we can rightfully claim to be a computational efficient, scalable, intuitive appealing and highly predictively accurate outlier detection method for both HDLSS and LDHSS datasets

Summary

Introduction

⊂ 1× p , under the special scenario in which n p refers to as high dimensional low sample size (HDLSS) setting. Given a dataset with the above characteristics, the goal of all outlier detection techniques and methods is to select and isolate as many outliers as possible so as to perform robust statistical procedures non-aversely affected by those outliers In such scenarios, where the multivariate Gaussian is the assumed basic underlying distribution, the classical Mahalanobis distance is the default measure of the proximity of the observations, namely d. If instead of reducing the dimensionality based on robust estimators, one can first apply PCA to the whole data, outliers may surprisingly lie on several directions where they are exposed more clearly and distinctly Such an insight appears to have motivated the creation of the so-called PCOut algorithm proposed by [3].

Rationale for Random Subspace Learning

Description Random Subspace Learning for Outlier Detection

Justification Random Subspace Learning for Outlier Detection

Alternatives to Parametric Outlier Detection Methods

Setup of Computational Demonstration and Initial Results

Further Results and Computational Comparisons

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Random Subspace Learning Approach to High-Dimensional Outliers Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Open Journal of Statistics

Lead the way for us

Similar Papers

Threshold Effects on Outlier Detection: A Comparative Study of MCD and MRCD Estimators in Multivariate Data Analysis
Nafisat Yusuf ... Bannister Jerry Zachary
Asian Journal of Probability and Statistics | VOL. 25
Nafisat Yusuf, et. al.Nafisat Yusuf ... Bannister Jerry Zachary
04 Nov 2023
Asian Journal of Probability and Statistics | VOL. 25

Mahalanobis distance based on minimum regularized covariance determinant estimators for high dimensional data
Hasan Bulut
Communications in Statistics - Theory and Methods | VOL. 49
Hasan BulutHasan Bulut
29 Jan 2020
Communications in Statistics - Theory and Methods | VOL. 49

HDLSS Discrimination With Adaptive Data Piling
Myung Hee Lee ... Yongho Jeon
Journal of Computational and Graphical Statistics | VOL. 22
Myung Hee Lee, et. al.Myung Hee Lee ... Yongho Jeon
01 Apr 2013
Journal of Computational and Graphical Statistics | VOL. 22

High dimensional low sample size activity recognition using geometric classifiers
Muhammad Shahzad Cheema ... Christian Bauckhage
Digital Signal Processing | VOL. 42
Muhammad Shahzad Cheema, et. al.Muhammad Shahzad Cheema ... Christian Bauckhage
22 Apr 2015
Digital Signal Processing | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Random Subspace Learning Approach to High-Dimensional Outliers Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Open Journal of Statistics