Abstract

We point out an instantiation of Simpson's paradox in COVID-19 case fatality rates (cfrs): comparing a large-scale study from China (February 17) with early reports from Italy (March 9), we find that cfrs are lower in Italy for every age group, but higher overall. This phenomenon is explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we introduce basic concepts from mediation analysis and show how these can be used to quantify different direct and indirect effects when assuming a coarse-grained causal graph involving country, age, and case fatality. We curate an age-stratified cfr dataset with n}{}>750 k cases and conduct a case study, investigating total, direct, and indirect (age-mediated) causal effects between different countries and at different points in time. This allows us to separate age-related effects from others unrelated to age and facilitates a more transparent comparison of cfrs across countries at different stages of the COVID-19 pandemic. Using longitudinal data from Italy, we discover a sign reversal of the direct causal effect in mid-March, which temporally aligns with the reported collapse of the healthcare system in parts of the country. Moreover, we find that direct and indirect effects across 132 pairs of countries are only weakly correlated, suggesting that a country's policy and case demographic may be largely unrelated. We point out limitations and extensions for future work, and finally, discuss the role of causal reasoning in the broader context of using AI to combat the COVID-19 pandemic.Impact Statement—During a global pandemic, understanding the causal effects of risk factors such as age on COVID-19 fatality is an important scientific question. Since randomised controlled trials are typically infeasible or unethical in this context, causal investigations based on observational data—such as the one carried out in this article—will, therefore, be crucial in guiding our understanding of the available data. Causal inference, in particular mediation analysis, can be used to resolve apparent statistical paradoxes; help educate the public and decision-makers alike; avoid unsound comparisons; and answer a range of causal questions pertaining to the pandemic, subject to transparently stated assumptions. Our exposition helps clarify how mediation analysis can be used to investigate direct and indirect effects along different causal paths and thus serves as a stepping stone for future studies of other important risk factors for COVID-19 besides age.

Highlights

  • T HE 2019–20 coronavirus pandemic originates from the SARS-CoV-2 virus, which causes the associated infectious respiratory disease COVID-19

  • E.g., we may ask QNIE: “How would the overall cfr in China change if the case demographic had instead been that from Italy while keeping all else (i.e., case fatality rates (CFRs)’s of each age group) the same?”. Since this considers a change of the mediator to the natural distribution it would follow under a change treatment while keeping the treatment the same (Chinese CFR’s), the answer to this question is referred to as the average Natural Indirect Effect (NIE)

  • To employ the tools from mediation analysis outlined in IV to better understand the influence of age on COVID-19 CFRs, we curated a dataset of confirmed cases and fatalities by age group (0–9, 10–19, etc.) from 11 countries (Argentina, China, Colombia, Italy, the Netherlands, Portugal, South Africa, Spain, Sweden, Switzerland, and South Korea) and the Diamond Princess cruise ship, on which the disease spread among passengers forced to quarantine on board [27]

Read more

Summary

INTRODUCTION

T HE 2019–20 coronavirus pandemic originates from the SARS-CoV-2 virus, which causes the associated infectious respiratory disease COVID-19. This example illustrates how a traditional statistical analysis provides insufficient understanding of the data, and needs to be augmented by additional assumptions about the underlying causal relationships. We hope that this article can serve as a stepping stone for further studies to gain better insight into the mechanisms underlying COVID-19 fatality using a principled and transparent causal framework

SIMPSON’S PARADOX IN COMPARING CFRs BETWEEN CHINA AND ITALY
CAUSAL MODEL FOR COVID-19 CFR DATA
Included Variables
Observational Sample and Causal Sufficiency
Data Generating Process and Causal Graph
Mediation Formulas
Mediation Analysis in AI
Dataset
Tracing Causal Effects Over Time
Comparison Between Several Different Countries
LIMITATIONS AND FUTURE
Considering Additional Mediators
Testing Strategy and Selection Bias
DISCUSSION
Simpson’s Paradox in the Context of AI
AI Against COVID-19: A Causal View
Findings
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call