Revisiting proportion estimators

Dankmar Böhning,Chukiat Viwatwongkasem

doi:10.1191/0962280205sm393oa

Abstract

Proportion estimators are quite frequently used in many application areas. The conventional proportion estimator (number of events divided by sample size) encounters a number of problems when the data are sparse as will be demonstrated in various settings. The problem of estimating its variance when sample sizes become small is rarely addressed in a satisfying framework. Specifically, we have in mind applications like the weighted risk difference in multicenter trials or stratifying risk ratio estimators (to adjust for potential confounders) in epidemiological studies. It is suggested to estimate p using the parametric family p(c) and p(1 - p) using p(c)(1 - p(c)), where p(c) = (X + c)/(n + 2c). We investigate the estimation problem of choosing c > or = 0 from various perspectives including minimizing the average mean squared error of p(c), average bias and average mean squared error of p(c)(1 - p(c)). The optimal value of c for minimizing the average mean squared error of p(c) is found to be independent of n and equals c = 1. The optimal value of c for minimizing the average mean squared error of p(c)(1 - p(c)) is found to be dependent of n with limiting value c = 0.833. This might justify to use a near-optimal value of c = 1 in practice which also turns out to be beneficial when constructing confidence intervals of the form p(c)+/-1.96 square root of np(c)(1 - p(c))/(n + 2c).

Full Text