In many domains, it is difficult to obtain the race data that is required to estimate racial disparity. To address this problem, practitioners have adopted the use of proxy methods which predict race using non-protected covariates. However, these proxies often yield biased estimates, especially for minority groups, limiting their real-world utility. In this paper, we introduce two new contextual proxy models that advance existing methods by incorporating contextual features in order to improve race estimates. We show that these algorithms demonstrate significant performance improvements in estimating disparities, on real-world home loan and voter data. We establish that achieving unbiased disparity estimates with contextual proxies relies on mean-consistency, a calibration-like condition.
Read full abstract