Telling cause from effect by local and global regression

Alexander Marx,Jilles Vreeken

doi:10.1007/s10115-018-1286-7

Abstract

We consider the problem of inferring the causal direction between two univariate numeric random variables X and Y from observational data. This case is especially challenging as the graph X causes Y is Markov equivalent to the graph Y causes X, and hence it is impossible to determine the correct direction using conditional independence tests. To tackle this problem, we follow an information theoretic approach based on the algorithmic Markov condition. This postulate states that in terms of Kolmogorov complexity the factorization given by the true causal model is the most succinct description of the joint distribution. This means that we can infer that X is a likely cause of Y when we need fewer bits to first transmit the data over X, and then the data of Y as a function of X, than for the inverse direction. That is, in this paper we perform causal inference by compression. To put this notion to practice, we employ the Minimum Description Length principle, and propose a score to determine how many bits we need to transmit the data using a class of regression functions that can model both local and global functional relations. To determine whether an inference, i.e. the difference in compressed sizes, is significant, we propose two analytical significance tests based on the no-hypercompression inequality. Last, but not least, we introduce the linear-time Slope and Sloper algorithms that through thorough empirical evaluation we show outperform the state of the art by a wide margin.

Highlights

Telling apart cause and effect given only observational data is one of the fundamental problems in science [22,31]
In this paper we perform causal inference by compression. To put this notion to practice, we employ the Minimum Description Length principle, and propose a score to determine how many bits we need to transmit the data using a class of regression functions that can model both local and global functional relations
As we model Y as a function of X and noise, our approach is somewhat reminiscent to causal inference based on Additive Noise Models (ANMs) [30], where one assumes that Y is generated as a function of X plus additive noise, Y = f (X ) + N with X ⊥⊥ N

Summary

Introduction

Telling apart cause and effect given only observational data is one of the fundamental problems in science [22,31]. We are interested in identifying whether X causes Y , whether Y causes X , or whether they are merely correlated

Objectives

Methods

Results

Discussion

Conclusion