Abstract

Abstract When sample survey data with complex design (stratification, clustering, unequal selection or inclusion probabilities, and weighting) are used for linear models, estimation of model parameters and their covariance matrices becomes complicated. Standard fitting techniques for sample surveys either model conditional on survey design variables, or use only design weights based on inclusion probabilities essentially assuming zero error covariance between all pairs of population elements. Design properties that link two units are not used. However, if population error structure is correlated, an unbiased estimate of the linear model error covariance matrix for the sample is needed for efficient parameter estimation. By making simultaneous use of sampling structure and design-unbiased estimates of the population error covariance matrix, the paper develops best linear unbiased estimation (BLUE) type extensions to standard design-based and joint design and model based estimation methods for linear models. The analysis covers both with and without replacement sample designs. It recognises that estimation for with replacement designs requires generalized inverses when any unit is selected more than once. This and the use of Hadamard products to link sampling and population error covariance matrix properties are central topics of the paper. Model-based linear model parameter estimation is also discussed.

Highlights

  • There are two relatively distinct methodologies for analysis of sample survey data collected via a complex sampling scheme that may include strati cation, clustering and weighting of responses.In the 1980s and early 1990s there was considerable academic debate around whether design-based or model-based methods were better

  • The model-assisted analysis of sample surveys had been foreshadowed, for example in Cochran [3] and the joint design- and model-based approach was considered in detail in Haslett [8] and used in Fuller [4]

  • When survey estimation is design based and includes weights, via inclusion and joint inclusion probabilities, or functions of them, the positive semide niteness of estimated covariance structure for the error in a linear model constructed from survey data cannot be guaranteed, except if all joint inclusion probabilities, or equivalently covariances between population elements are ignored

Read more

Summary

Introduction

There are two relatively distinct methodologies for analysis of sample survey data collected via a complex sampling scheme that may include strati cation, clustering and weighting of responses. Neither selection nor inclusion and joint inclusion probabilities are generally used, and parameter estimates via design- and model-based methods are not necessarily equal. Stephen Haslett models can be tted to sample survey data to provide better estimates of the parameters and of their covariance matrices. The standard, design-based way to t a linear model to survey data is to use inverse inclusion probabilities as weights within a model-based context. The standard design-based least squares solution for full rank X (e.g., Chambers & Skinner [2], Skinner et al [18]) is β = (X Π− X)− X Π− Y (2.2) This design-based solution (2.2) is weighted least squares, using weights equal to the inverse of the inclusion probabilities for the sampled units.

De ne χP to be the
Sampling with replacement
Cr p
Nn is also nN
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call