Abstract

Data from a large number of covariates with known population totals are frequently observed in survey studies. These auxiliary variables contain valuable information that can be incorporated into estimation of the population total of a survey variable to improve the estimation precision. We consider the generalized regression estimator formulated under the model-assisted framework in which a regression model is utilized to make use of the available covariates while the estimator still has basic design-based properties. The generalized regression estimator has been shown to improve the efficiency of the design-based Horvitz-Thompson estimator when the number of covariates is fixed. In this study, we investigate the performance of the generalized regression estimator when the number of covariates p is allowed to diverge as the sample size n increases. We examine two approaches where the model parameter is estimated using the weighted least squares method when p < n and the LASSO method when the model parameter is sparse. We show that under an assisted model and certain conditions on the joint distribution of the covariates as well as the divergence rates of n and p, the generalized regression estimator is asymptotically more efficient than the Horvitz-Thompson estimator, and is robust against model misspecification. We also study the consistency of variance estimation for the generalized regression estimator. Our theoretical results are corroborated by simulation studies and an example.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call