Estimating individual treatment effects with observational data is crucial in causal inference. While it faces two major challenges: the absence of counterfactual outcomes and selection bias due to non-random assignment and intervention of confounding variables. This paper specifically addresses these by quantifying the distribution shift between the treated and control groups from a generative view to achieve balanced representation and then predict causal effects. We propose a novel method called Denoising for a Balanced Representation in Treatment Effect Estimation (DBRT). Motivated by the aim to generate analogous distributions in the Diffusion model, which aligns with our goal of achieving similar distributions in the latent space, our approach considers the divergence between the treated and control groups as’noise’ that needs to be removed incrementally. We then construct a network to fit and then denoise it step by step, ensuring a consistent reduction of discrepancies between the two groups. Additionally, to enhance the representation of input features, we incorporate memory vectors into the attention mechanism to capture prior information and combined correlations between covariates. Furthermore, we utilize the Hilbert–Schmidt Independence Criterion (HSIC) within the loss function to constrain the learned representation, ensuring its relevance to the original data. DBRT has demonstrated superior performance on different datasets compared to previous classic works, highlighting its effectiveness in treatment effect estimation.
Read full abstract