Abstract

In this paper, the performance of outlier detection methods has been evaluated with symmetrically distributed datasets. We choose four estimators, viz. modified Stahel-Donoho (MSD) estimators, blocked adaptive computationally efficient outlier nominators, minimum covariance determinant estimator obtained by a fast algorithm, and nearest-neighbour variance estimator, which are known for their good performance with elliptically distributed data, for practical applications in national survey data processing. We adopt the data model of multivariate skew-t distribution, of which only the direction of the main axis is skewed and contaminated with outliers following another probability distribution for evaluation. We conducted Monte Carlo simulation under the data distribution to compare the performance of outlier detection. We also explore the applicability of the selected methods for several accounting items in small and medium enterprise survey data. Accordingly, it was found that the MSD estimators are the most suitable.

Highlights

  • ObjectiveWe discuss the performance of multivariate outlier detection methods applied on asymmetric data

  • In this paper, the performance of outlier detection methods has been evaluated with symmetrically distributed datasets

  • modified StahelDonoho (MSD) appears to be better than blocked adaptive computationally efficient outlier nominators (BACON), followed by Fast-MCD and neatest-neighbour variance estimator (NNVE)

Read more

Summary

Objective

We discuss the performance of multivariate outlier detection methods applied on asymmetric data. Comparison of Multivariate Outlier Detection Methods are aggregated and relations between variables become less visible, microdata preserve such relations. Wada (2004) compared BACON, neatest-neighbour variance estimator (NNVE) by Wang and Raftery (2002), and Fast-MCD estimator using asymmetrically contaminated normal and skew-t data in addition to some famous datasets for outlier detection, such as Hertzsprung-Russell (Rousseuw and Leroy 1987), Bushfire (Campbell 1989), Ionosphere from the UCI Machine Learning Repository, to find the difference among these methods.

Multivariate outlier detection methods for evaluation
MSD estimators
Fast-MCD estimator
Random datasets
Results of the random datasets
Application to a survey data
Unincorporated Enterprise Survey
Data transformation and settings of outlier detection
Results obtained with the survey data
Method
Conclusion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call