Tailored inference for finite populations: conditional validity and transfer across distributions

Ying Jin,Dominik Rothenhäusler

doi:10.1093/biomet/asad022

Abstract

Summary Parameters of subpopulations can be more relevant than those of superpopulations. For example, a healthcare provider may be interested in the effect of a treatment plan for a specific subset of their patients; policymakers may be concerned with the impact of a policy in a particular state within a given population. In these cases, the focus is on a specific finite population, as opposed to an infinite superpopulation. Such a population can be characterized by fixing some attributes that are intrinsic to them, leaving unexplained variations like measurement error as random. Inference for a population with fixed attributes can then be modelled as inferring parameters of a conditional distribution. Accordingly, it is desirable that confidence intervals are conditionally valid for the realized population, instead of marginalizing over many possible draws of populations. We provide a statistical inference framework for parameters of finite populations with known attributes. Leveraging the attribute information, our estimators and confidence intervals closely target a specific finite population. When the data are from the population of interest, our confidence intervals attain asymptotic conditional validity, given the attributes, and are shorter than those for superpopulation inference. In addition, we develop procedures to infer parameters of new populations with differing covariate distributions; the confidence intervals are also conditionally valid for the new populations under mild conditions. Our methods extend to situations where the fixed information has a weaker structure or is only partially observed. We demonstrate the validity and applicability of our methods using simulated data and a real-word dataset for predicting car prices.

Full Text