Robust Estimation for a Generalised Ratio Model

Kazumi Wada,Hiroe Tsubaki,Keiichiro Sakashita

doi:10.17713/ajs.v50i1.994

Abstract

It is known that data such as business sales and household income need data transformation prior to regression estimate as the data has a homoscedastic error. However, data transformations make the estimation of mean and total unstable. Therefore, the ratio model is often used for imputation in the field of official statistics to avoid the problem. Our study aims to robustify the estimator following the ratio model by means of M-estimation. Reformulation of the conventional ratio model with homoscedastic quasi-error term provides quasi-residuals which can be used as a measure of outlyingness as same as a linear regression model. A generalisation of the model, which accommodates varied error terms with different heteroscedasticity, is also proposed. Functions for robustified estimators of the generalised ratio model are implemented by the iterative re-weighted least squares algorithm in R environment and illustrated using random datasets. Monte Carlo simulation confirms accuracy of the proposed estimators, as well as their computational efficiency. A comparison of the scale parameters between the average absolute deviation (AAD) and median absolute deviation (MAD) is made regarding Tukey's biweight function. The results with Huber's weight function are also provided for reference. The proposed robust estimator of the generalised ratio model is used for imputation of major corporate accounting items of the 2016 Economic Census for Business Activity in Japan.

Highlights

Ratio imputation is a special case of regression imputation (De Waal, Pannekoek, and Scholtus (2011), pp.244–245)
We describe the estimators A, B, and C with σAAD as AAAD, BAAD, and CAAD, and those with σMAD as AMAD, BMAD, and CMAD
The proposed generalised ratio model broadens the conventional definition of the ratio model with regards to the variance of the error term

Summary

Introduction

Ratio imputation is a special case of regression imputation (De Waal, Pannekoek, and Scholtus (2011), pp.244245). When there are missing values in the target variable y, the observed auxiliary variable x is used to estimate missing y values. X must be chosen from the variables that are highly correlated with y. The imputation model is yi = βxi + i, (1). N of (x, y) are observed n units in the imputation class of size N. The true ratio β is obtained by y/x; it is usually unknown due to the existence of missing values in y.

Methods

Results

Conclusion