Abstract

We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.

Highlights

  • ⊂ 1× p, under the special scenario in which n p refers to as high dimensional low sample size (HDLSS) setting

  • The Random Subspace Learning (RSSL) outlier detection algorithm computes a determinant of covariance for each subsample, with each subsample residing in a subspace spanned by the d randomly selected variables, where d is usually selected to be

  • We have presented what we can rightfully claim to be a computational efficient, scalable, intuitive appealing and highly predictively accurate outlier detection method for both HDLSS and LDHSS datasets

Read more

Summary

Introduction

⊂ 1× p , under the special scenario in which n p refers to as high dimensional low sample size (HDLSS) setting. Given a dataset with the above characteristics, the goal of all outlier detection techniques and methods is to select and isolate as many outliers as possible so as to perform robust statistical procedures non-aversely affected by those outliers In such scenarios, where the multivariate Gaussian is the assumed basic underlying distribution, the classical Mahalanobis distance is the default measure of the proximity of the observations, namely d. If instead of reducing the dimensionality based on robust estimators, one can first apply PCA to the whole data, outliers may surprisingly lie on several directions where they are exposed more clearly and distinctly Such an insight appears to have motivated the creation of the so-called PCOut algorithm proposed by [3].

Rationale for Random Subspace Learning
Description Random Subspace Learning for Outlier Detection
Justification Random Subspace Learning for Outlier Detection
Alternatives to Parametric Outlier Detection Methods
Setup of Computational Demonstration and Initial Results
Further Results and Computational Comparisons
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call