Abstract
We introduce and develop a novel approach to outlier detection based on adaptation of random subspace learning. Our proposed method handles both high-dimension low-sample size and traditional low-dimensional high-sample size datasets. Essentially, we avoid the computational bottleneck of techniques like Minimum Covariance Determinant (MCD) by computing the needed determinants and associated measures in much lower dimensional subspaces. Both theoretical and computational development of our approach reveal that it is computationally more efficient than the regularized methods in high-dimensional low-sample size, and often competes favorably with existing methods as far as the percentage of correct outlier detection are concerned.
Highlights
⊂ 1× p, under the special scenario in which n p refers to as high dimensional low sample size (HDLSS) setting
The Random Subspace Learning (RSSL) outlier detection algorithm computes a determinant of covariance for each subsample, with each subsample residing in a subspace spanned by the d randomly selected variables, where d is usually selected to be
We have presented what we can rightfully claim to be a computational efficient, scalable, intuitive appealing and highly predictively accurate outlier detection method for both HDLSS and LDHSS datasets
Summary
⊂ 1× p , under the special scenario in which n p refers to as high dimensional low sample size (HDLSS) setting. Given a dataset with the above characteristics, the goal of all outlier detection techniques and methods is to select and isolate as many outliers as possible so as to perform robust statistical procedures non-aversely affected by those outliers In such scenarios, where the multivariate Gaussian is the assumed basic underlying distribution, the classical Mahalanobis distance is the default measure of the proximity of the observations, namely d. If instead of reducing the dimensionality based on robust estimators, one can first apply PCA to the whole data, outliers may surprisingly lie on several directions where they are exposed more clearly and distinctly Such an insight appears to have motivated the creation of the so-called PCOut algorithm proposed by [3].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.