This article deals with the discovery of causal relations from a combination of observational data and qualitative assumptions about the nature of causality in the presence of unmeasured confounding. We focus on applications where unobserved variables are known to have a widespread effect on many of the observed ones, which makes the problem particularly difficult for constraint-based methods, because most pairs of variables are conditionally dependent given any other subset, rendering the causal effect unidentifiable. In this article, we show that under the principle of independent mechanisms, unobserved confounding in this setting leaves a statistical footprint in the observed data distribution that allows for disentangling spurious and causal effects. Using this insight, we demonstrate that a sparse linear Gaussian directed acyclic graph (DAG) among observed variables may be recovered approximately and propose a simple adjusted score-based causal discovery algorithm that may be implemented with general-purpose solvers and scales to high-dimensional problems. We find, in addition, that despite the conditions we pose to guarantee causal recovery, performance in practice is robust to large deviations in model assumptions, and extensions to nonlinear structural models are possible.
Read full abstract