I recently discussed pitfalls in attempted causal inference based on reduced‐form regression models. I used as motivation a real‐world example from a paper by Dr. Sneeringer, which interpreted a reduced‐form regression analysis as implying the startling causal conclusion that “doubling of [livestock] production leads to a 7.4% increase in infant mortality.” This conclusion is based on: (A) fitting a reduced‐form regression model to aggregate (e.g., county‐level) data; and (B) (mis)interpreting a regression coefficient in this model as a causal coefficient, without performing any formal statistical tests for potential causation (such as conditional independence, Granger‐Sims, or path analysis tests). Dr. Sneeringer now adds comments that confirm and augment these deficiencies, while advocating methodological errors that, I believe, risk analysts should avoid if they want to reach logically sound, empirically valid, conclusions about cause and effect. She explains that, in addition to (A) and (B) above, she also performed other steps such as (C) manually selecting specific models and variables and (D) assuming (again, without testing) that hand‐picked surrogate variables are valid (e.g., that log‐transformed income is an adequate surrogate for poverty). In her view, these added steps imply that “critiques of A and B are not applicable” to her analysis and that therefore “a causal argument can be made” for “such a strong, robust correlation” as she believes her regression coefficient indicates. However, multiple wrongs do not create a right. Steps (C) and (D) exacerbate the problem of unjustified causal interpretation of regression coefficients, without rendering irrelevant the fact that (A) and (B) do not provide evidence of causality. This reply focuses on whether any statistical techniques can produce the silk purse of a valid causal inference from the sow's ear of a reduced‐form regression analysis of ecological data. We conclude that Dr. Sneeringer's analysis provides no valid indication that air pollution from livestock operations causes any increase in infant mortality rates. More generally, reduced‐form regression modeling of aggregate population data—no matter how it is augmented by fitting multiple models and hand‐selecting variables and transformations—is not adequate for valid causal inference about health effects caused by specific, but unmeasured, exposures.
Read full abstract