Small-Area Estimation with Zero-Inflated Data – a Simulation Study

Sabine Krieg,Marc Smeets,Harm Jan Boonstra

doi:10.1515/jos-2016-0051

Sabine Krieg, Marc Smeets + Show 1 more

Open Access

https://doi.org/10.1515/jos-2016-0051

Copy DOI

Journal: Journal of official statistics	Publication Date: Nov 23, 2016
Citations: 3	License type: CC BY-NC-ND 4.0

Affiliation: Centraal Bureau voor de Statistiek

Abstract

Abstract Many target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area estimators can be used, which improve the accuracy of the estimates by borrowing information from other subpopulations. In this article, three small-area estimators are investigated. The first estimator is the EBLUP, which can be considered the most common small-area estimator and is based on a linear mixed model that assumes normal distributions. Therefore, the EBLUP is model misspecified in the case of zero-inflated variables. The other two small-area estimators are based on a model that takes zero inflation explicitly into account. Both the Bayesian and the frequentist approach are considered. These small-area estimators are compared with each other and with design-based estimation in a simulation study with zero-inflated target variables. Both a simulation with artificial data and a simulation with real data from the Dutch Household Budget Survey are carried out. It is found that the small-area estimators improve the accuracy compared to the design-based estimator. The amount of improvement strongly depends on the properties of the population and the subpopulations of interest.

Full Text