Abstract

The paper deals with the scenario where some covariates are observed by design for a subset of the observations only. In the example treated in the paper this occurs with a two phase sampling scheme where in the first phase a relatively large sample is drawn to record a response variable Y and a set of (cheap) covariates x. In a second phase a smaller sample is drawn from the first phase sample where additional (usually expensive) covariates z are also recorded. The second phase can be drawn with unequal probability sampling, where the sampling weights depend on the observed Y and x. The overall intention is to fit a regression model of Y on both, x and z. Due to the design of the data collection we are faced with missing values for z for a majority of observations. We propose an approximate estimation approach using semi-parametric mean and variance regression of Y on x only and augment this fit with a full regression model of Y on x and z. The idea extends the approach of Little (1992) towards non-normal data and non-linear models. The proposed estimation is numerically rather simple and performs convincingly well in simulation studies compared to alternatives such as complete-case and multiple imputation analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call