Outliers in official statistics

Kazumi Wada

doi:10.1007/s42081-020-00091-y

Abstract

The purpose of this manuscript is to provide a survey on the important methods addressing outliers while producing official statistics. Outliers are often unavoidable in survey statistics. They may reduce the information of survey datasets and distort estimation on each step of the survey statistics production process. This paper defines outliers to be focused on each production step and introduces practical methods to cope with them. The statistical production process is roughly divided into the following three steps. The first step is data cleaning, and outliers to be focused are that may contain mistakes to be corrected. Robust estimators of a mean vector and covariance matrix are introduced for the purpose. The next step is imputation. Among a variety of imputation methods, regression and ratio imputation are the subjects in this paper. Outliers to be focused on in this step are not erroneous but have extreme values that may distort parameter estimation. Robust estimators that are not affected by remaining outliers are introduced. The final step is estimation and formatting. We have to be careful about outliers that have extreme values with large design weights since they have a considerable influence on the final statistics products. Weight calibration methods controlling the influence are discussed based on the robust weights obtained in the previous imputation step. A few examples of practical application are also provided briefly, although multivariate outlier detection methods introduced in this paper are mostly in the research stage in the field of official statistics.

Highlights

1.1 What are outliersOutliers are extreme or atypical values that can reduce and distort the information in a dataset
The yellow-colored rectangular area shows the thresholds according to the three-sigma rule; the green area shows the thresholds identified by the box-and-whisker method
The red probability ellipses are drawn using modified Stahel-Donoho (MSD) estimators produced by robust principal component analysis (PCA) based on Béguin and Hulliger (2003)

Summary

Introduction

1.1 What are outliersOutliers are extreme or atypical values that can reduce and distort the information in a dataset. The problem of how to deal with outliers has long been a concern. Barnett and Lewis 3) devised a principle to accommodate outliers using robust methods of inference, allowing for the use of all the data while alleviating the undue influence of outliers. To deal with the problem, Barnett and Lewis We follow this principle and focus on the robust statistical methods introduced by Huber (1964) that are the most suitable for survey data processing. Statistical tests are beyond the scope of our discussion

Objectives

Methods

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Japanese Journal of Statistics and Data Science	Publication Date: Oct 24, 2020
Citations: 21	License type: open-access

R Discovery Prime

R Discovery Prime

Outliers in official statistics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Japanese Journal of Statistics and Data Science

Lead the way for us

Similar Papers

Editor's evaluation: Robust and Efficient Assessment of Potency (REAP) as a quantitative tool for dose-response curve estimation
Philip Boonstra
-
Philip BoonstraPhilip Boonstra
09 May 2022
09 May 2022

Information Systems Architecture Framework for National Official Statistics Organizations
Arlindo B P Nhabomba ... Isabel Machado Alexandre
-
Arlindo B P Nhabomba, et. al.Arlindo B P Nhabomba ... Isabel Machado Alexandre
23 Jun 2021
23 Jun 2021

Towards a modular end-to-end statistical production process with mobile network data
David Salgado ... Sandra Barragán
Spanish Journal of Statistics | VOL. 2
David Salgado, et. al.David Salgado ... Sandra Barragán
01 Jan 2020
Spanish Journal of Statistics | VOL. 2

FAIR Digital Objects in Official Statistics
Olav Ten Bosch ... Christine Laaboudi-Spoiden
Research Ideas and Outcomes | VOL. 8
Olav Ten Bosch, et. al.Olav Ten Bosch ... Christine Laaboudi-Spoiden
12 Oct 2022
Research Ideas and Outcomes | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Outliers in official statistics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Japanese Journal of Statistics and Data Science