Downweighting Influential Clusters in Surveys

Alan M Zaslavsky,Nathaniel Schenker,Thomas R Belin

doi:10.1198/016214501753208889

Abstract

Certain clusters may be extremely influential on survey estimates and consequently contribute disproportionately to their variance. We propose a general approach to estimation that downweights highly influential clusters, with the amount of downweighting based on M-estimation applied to the empirical influence of the clusters. The method is motivated by a problem in census coverage estimation, and we illustrate it by using data from the 1990 Post Enumeration Survey (PES). In this context, an objective, prespecified methodology for handling influential observations is essential to avoid having to justify judgmental post hoc adjustment of weights. In 1990, both extreme weights and large errors in the census led to extreme influence. We estimated influence by Taylor linearization of the survey estimator, and we applied M-estimators based on the t distribution and the Huber ψ-function. As predicted by theory, the robust procedures greatly reduced the estimated variance of estimated coverage rates, more so than did truncation of weights. On the other hand, the procedure may introduce bias into survey estimates when the distributions of the influence statistics are asymmetric. We consider the properties of the estimators in the presence of asymmetry, and we demonstrate techniques for assessing the bias-variance trade-off, finding that estimated mean squared error is reduced by applying the robust procedure to our dataset. We also suggest PES design improvements to reduce the impact of influential clusters.

Full Text