Abstract

The purpose of this manuscript is to provide a survey on the important methods addressing outliers while producing official statistics. Outliers are often unavoidable in survey statistics. They may reduce the information of survey datasets and distort estimation on each step of the survey statistics production process. This paper defines outliers to be focused on each production step and introduces practical methods to cope with them. The statistical production process is roughly divided into the following three steps. The first step is data cleaning, and outliers to be focused are that may contain mistakes to be corrected. Robust estimators of a mean vector and covariance matrix are introduced for the purpose. The next step is imputation. Among a variety of imputation methods, regression and ratio imputation are the subjects in this paper. Outliers to be focused on in this step are not erroneous but have extreme values that may distort parameter estimation. Robust estimators that are not affected by remaining outliers are introduced. The final step is estimation and formatting. We have to be careful about outliers that have extreme values with large design weights since they have a considerable influence on the final statistics products. Weight calibration methods controlling the influence are discussed based on the robust weights obtained in the previous imputation step. A few examples of practical application are also provided briefly, although multivariate outlier detection methods introduced in this paper are mostly in the research stage in the field of official statistics.

Highlights

  • 1.1 What are outliersOutliers are extreme or atypical values that can reduce and distort the information in a dataset

  • The yellow-colored rectangular area shows the thresholds according to the three-sigma rule; the green area shows the thresholds identified by the box-and-whisker method

  • The red probability ellipses are drawn using modified Stahel-Donoho (MSD) estimators produced by robust principal component analysis (PCA) based on Béguin and Hulliger (2003)

Read more

Summary

Introduction

1.1 What are outliersOutliers are extreme or atypical values that can reduce and distort the information in a dataset. The problem of how to deal with outliers has long been a concern. Barnett and Lewis 3) devised a principle to accommodate outliers using robust methods of inference, allowing for the use of all the data while alleviating the undue influence of outliers. To deal with the problem, Barnett and Lewis We follow this principle and focus on the robust statistical methods introduced by Huber (1964) that are the most suitable for survey data processing. Statistical tests are beyond the scope of our discussion

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.