From correlation to causation using directed topological overlap matrix: Applications in genomics.

Borzou Alipourfard,Jean Gao

doi:10.1016/j.ymeth.2023.09.005

Abstract

Most causal discovery tools assume the local causal Markov condition. However, the theoretical assumptions that underlie the local causal Markov condition are often not met in practice. This is especially marked in genomics, where the unwanted presence of measurement errors, averaging effects, and feedback loops significantly undermine the legitimacy of the local causal Markov condition. Furthermore, these causal discovery algorithms require very large samples, orders above what is often available. In this paper, relaxing the local causal Markov condition and using Reichenbach's common cause principle instead, we present a more flexible approach to causal discovery, the directed topological overlap matrix (DTOM). DTOM is robust w.r.t the presence of measurement errors, averaging effects, feedback loops, and is significantly more sample efficient. We study the utility of DTOM for discovering causal relations in biological data using three real gene expression data-sets. We first examine if DTOM can help distinguish the Myostatin mutation in the Piedmontese cattle by contrasting the muscle transcriptomes of the Piedmontese and Wagyu crosses: the Myostatin mutation is the cause of the double-muscling the Piedmontese cattle are famous for. We then consider a large-scale gene deletion study in yeast. We show that DTOM allows us to distinguish the deleted gene in a sample knowing only the set of differentially expressed genes in that sample. We then examine the progression of Alzheimer's disease (AD) under the lens of DTOM. The genes implicated as having a causal role in the progression of AD by our DTOM analysis were significantly enriched in cellular components that had been repeatedly implicated in the progression of AD.

Full Text