Treatment of Multivariate Outliers in Incomplete Business Survey Data

Marc Bill,Beat Hulliger

doi:10.17713/ajs.v45i1.86

Abstract

The distribution of multivariate quantitative survey data usually is not normal. Skewed and semi-continuous distributions occur often. In addition, missing values and non-response is common. All together this mix of problems makes multivariate outlier detection difficult. Examples of surveys where these problems occur are most business surveys and some household surveys like the Survey for the Statistics of Income and Living Condition (SILC) of the European Union. Several methods for multivariate outlier detection are collected in the R-package modi. This paper gives an overview of modi and its functions for outlier detection and corresponding imputation. The use of the methods is explained with a business survey dataset. The discussion covers pre- and post-processing to deal with skewness and zero-inflation, advantages and disadvantages of the methods and the choice of the parameters.

Highlights

In surveys on monetary values, often several monetary variables are collected in order to capture the economic situation of an entity
Several multivariate outlier detection and imputation procedures are contained in Version 1.6 of the package modi
The sepe data set has first been prepared for the FP5 project EUREDIT (Charlton 2003) and later been used as protected data for educational purposes. For this demonstration of the modi package, we focus on 8 variables representing the most important expenditure-areas and investment-areas

Summary

Introduction

In surveys on monetary values, often several monetary variables are collected in order to capture the economic situation of an entity. This holds for business surveys, where many particular types of expenditures may be asked. Non-monetary quantitative variables may be collected like various health indicators in a health survey or physical production parameters in a business survey or in a survey on livestock of farms All these surveys have some common features: They have a complex sample design including stratification and possibly sub-sampling; they have elaborated questionnaires; they have unit and item non-response, and they typically have zero inflated distributions because of the multi-faceted economic situation.

Overview of the modi package

The SEPE data set

Applying the methods

EA – Epidemic Algorithm

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Austrian Journal of Statistics	Publication Date: Feb 29, 2016
Citations: 4	License type: cc-by

R Discovery Prime

R Discovery Prime

Treatment of Multivariate Outliers in Incomplete Business Survey Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Austrian Journal of Statistics

Lead the way for us

Similar Papers

Detection of multivariate outliers in business survey data with incomplete information
Valentin Todorov ... Peter Filzmoser
Advances in Data Analysis and Classification | VOL. 5
Valentin Todorov, et. al.Valentin Todorov ... Peter Filzmoser
27 Oct 2010
Advances in Data Analysis and Classification | VOL. 5

Application of multivariate outlier detection to fluid velocity measurements
John Griffin ... Louis N Cattafesta
Experiments in Fluids | VOL. 49
John Griffin, et. al.John Griffin ... Louis N Cattafesta
14 Apr 2010
Experiments in Fluids | VOL. 49

Multivariate Outlier Detection in Incomplete Survey Data: The Epidemic Algorithm and Transformed Rank Correlations
Cédric Béguin ... Beat Hulliger
Journal of the Royal Statistical Society Series A: Statistics in Society | VOL. 167
Cédric Béguin, et. al.Cédric Béguin ... Beat Hulliger
29 Mar 2004
Journal of the Royal Statistical Society Series A: Statistics in Society | VOL. 167

Data-driven cluster analysis method: a novel outliers detection method in multivariate data
A R Duarte ... F L P Oliveira
Communications in Statistics - Simulation and Computation | VOL. ahead-of-print
A R Duarte, et. al.A R Duarte ... F L P Oliveira
05 Jul 2024
Communications in Statistics - Simulation and Computation | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Treatment of Multivariate Outliers in Incomplete Business Survey Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Austrian Journal of Statistics