Abstract

Classical Mahalanobis distance is used as a method of detecting outliers, and is affected by outliers. Some robust Mahalanobis distance is proposed via the fast MCD estimator. However, the bias of the MCD estimator increases significantly as the dimension increases. In this paper, we propose the improved Mahalanobis distance based on a more robust Rocke estimator under high-dimensional data. The results of numerical simulation and empirical analysis show that our proposed method can better detect the outliers in the data than the above two methods when there are outliers in the data and the dimensions of data are very high.

Highlights

  • With the advancement of information technology, all fields have gradually entered into the era of big data

  • This study demonstrates that a semi-deterministic equivariant procedure, initially proposed by Peñaand Prieto (2007) [18] for outlier detection, dramatically improves both the computing times and the statistical performances of the estimators

  • It can be seen from the above results that under the 6-dimensional data setting, the results detected by the two robust Mahalanobis distances are substantially consistent with those of the classical Mahalanobis distances in the absence of outliers

Read more

Summary

Introduction

With the advancement of information technology, all fields have gradually entered into the era of big data. With more comprehensive research data in various research fields, it brings trouble to the processing data. In the case of more variables that need to be detected, collected and processed, the greater the probability of errors will cause, the more the number of outliers in the data will increase. The data will be affected by various complicated and uncertain factors, as well as the occurrence of outliers due to the accuracy of the instrument, statistical omissions, and operational errors. Detecting outliers are generally required before analyzing data. Due to the increase of the probability of occurrence of outliers in high-dimensional data, it is more necessary to detect outliers in high-dimensional data. For one-dimensional data, there are many methods for determining outliers, e.g., three standard deviation

Li et al DOI
Application of Mahalanobis Distance
The Principles and Algorithms of Rocke Estimator
Numerical Simulation Examples
Empirical Analysis
Findings
Discussion and Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call