A statistical semantics for causation

J. Pearl,T. S. Verma

doi:10.1007/978-1-4899-4537-2_24

Abstract

Statistics and Computing ( 1 9 9 2 ) 2, 9 1 - 9 5 A statistical semantics for causation JUDEA PEARL and T H O M A S S. V E R M A Cognitive Systems Laboratory, Computer Science Department, University of California, Los Angeles, CA 90024, USA Received J a n u a r y 1991 and accepted September 1991 We propose a model-theoretic definition of causation, and show that, contrary to common folklore, genuine causal influences can be distinguished from spurious covariations follow- ing standard norms of inductive reasoning. We also establish a sound characterization of the conditions under which such a distinction is possible. Finally, we provide a proof-theo- retical procedure for inductive causation and show that, for a large class of data and structures, effective algorithms exist that uncover the direction of causal influences as defined above. Keywords: Causality, induction, learning 1. T h e m o d e l We view the task of causal modeling as an identification game which scientists play against nature. N a t u r e pos- sesses stable causal mechanisms which, on a microscopic level, are deterministic functional relationships between variables, some o f which are unobservable. These mecha- nisms are organized in the form o f an acyclic schema which the scientist attempts to identify. Definition 1. A causal model over a set o f variables U is a directed acyclic graph ( D A G ) D, the nodes of which denote variables, and the links denote direct binary causal influences. The causal model serves as a blueprint for forming a 'causal t h e o r y ' - - a precise specification o f how each vari- able is influenced by its parents in the D A G . Here we assume that nature is at liberty to impose arbitrary func- tional relationships between each effect and its causes and ~This formulation employs several idealizations of the actual task of scientific discovery. It assumes, for example, that the scientist obtains the distribution directIy, rather than events sampled from the distribution. This assumption is justified when a large sample is available, sufficient to reveal all the dependencies embedded in the distribution. Additionally, we assume that the observed variables actually appear in the original causal theory and are not some aggregate thereof. Aggregation might result in feedback loops, which we do not discuss in this paper. Our theory also takes variables as the primitive entities in the language, not events which pernfits us to include 'enabling' and 'preventing' relation- ships as part of the mechanism. 0960-3174/92 9 1992 Chapman & Hall then to weaken these relationships by introducing arbi- trary (yet mutually independent) disturbances. These dis- turbances reflect 'hidden' or unmeasurable conditions and exceptions which nature chooses to govern by some undis- closed probability function. Definition 2. A causal theory is a pair T = (D, | ) con- taining a causal model D and a set o f parameters | compatible with D. | assigns a function xi =f[pa(xi), ei] and a probability measure g~, to each xi e U, where pa(x~) are the parents o f xi in D and each e~ is a r a n d o m disturbance distributed according to gi, independently o f the other Es and o f { x j } j - l . The requirement of independence renders the disturbances 'local' to each family; disturbances that influence several families simultaneously will be treated explicitly as 'latent' variables (see Definition 3 below). Once a causal theory T is formed, it defines a joint probability distribution P(T) over the variables in the system, and this distribution reflects some features o f the causal model (e.g., each variable must be independent o f its grandparents, given the values o f its parents). N a t u r e then permits the scientist to inspect a select subset O o f 'observed' variables, and to ask questions a b o u t the prob- ability distribution over the observables, but hides the underlying causal theory as well as the structure o f the causal model. We investigate the feasibility o f recovering the t o p o l o g y o f the D A G f r o m features o f the probability distribution. 1

Full Text