Abstract
Abstract When sample survey data with complex design (stratification, clustering, unequal selection or inclusion probabilities, and weighting) are used for linear models, estimation of model parameters and their covariance matrices becomes complicated. Standard fitting techniques for sample surveys either model conditional on survey design variables, or use only design weights based on inclusion probabilities essentially assuming zero error covariance between all pairs of population elements. Design properties that link two units are not used. However, if population error structure is correlated, an unbiased estimate of the linear model error covariance matrix for the sample is needed for efficient parameter estimation. By making simultaneous use of sampling structure and design-unbiased estimates of the population error covariance matrix, the paper develops best linear unbiased estimation (BLUE) type extensions to standard design-based and joint design and model based estimation methods for linear models. The analysis covers both with and without replacement sample designs. It recognises that estimation for with replacement designs requires generalized inverses when any unit is selected more than once. This and the use of Hadamard products to link sampling and population error covariance matrix properties are central topics of the paper. Model-based linear model parameter estimation is also discussed.
Highlights
There are two relatively distinct methodologies for analysis of sample survey data collected via a complex sampling scheme that may include strati cation, clustering and weighting of responses.In the 1980s and early 1990s there was considerable academic debate around whether design-based or model-based methods were better
The model-assisted analysis of sample surveys had been foreshadowed, for example in Cochran [3] and the joint design- and model-based approach was considered in detail in Haslett [8] and used in Fuller [4]
When survey estimation is design based and includes weights, via inclusion and joint inclusion probabilities, or functions of them, the positive semide niteness of estimated covariance structure for the error in a linear model constructed from survey data cannot be guaranteed, except if all joint inclusion probabilities, or equivalently covariances between population elements are ignored
Summary
There are two relatively distinct methodologies for analysis of sample survey data collected via a complex sampling scheme that may include strati cation, clustering and weighting of responses. Neither selection nor inclusion and joint inclusion probabilities are generally used, and parameter estimates via design- and model-based methods are not necessarily equal. Stephen Haslett models can be tted to sample survey data to provide better estimates of the parameters and of their covariance matrices. The standard, design-based way to t a linear model to survey data is to use inverse inclusion probabilities as weights within a model-based context. The standard design-based least squares solution for full rank X (e.g., Chambers & Skinner [2], Skinner et al [18]) is β = (X Π− X)− X Π− Y (2.2) This design-based solution (2.2) is weighted least squares, using weights equal to the inverse of the inclusion probabilities for the sampled units.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.