Learning Subject-Specific Directed Acyclic Graphs With Mixed Effects Structural Equation Models From Observational Data.

Xiang Li,Peter Mccolgan,Shanghong Xie,Sarah J Tabrizi,Rachael I Scahill,Yuanjia Wang,Donglin Zeng

doi:10.3389/fgene.2018.00430

Abstract

The identification of causal relationships between random variables from large-scale observational data using directed acyclic graphs (DAG) is highly challenging. We propose a new mixed-effects structural equation model (mSEM) framework to estimate subject-specific DAGs, where we represent joint distribution of random variables in the DAG as a set of structural causal equations with mixed effects. The directed edges between nodes depend on observed exogenous covariates on each of the individual and unobserved latent variables. The strength of the connection is decomposed into a fixed-effect term representing the average causal effect given the covariates and a random effect term representing the latent causal effect due to unobserved pathways. The advantage of such decomposition is to capture essential asymmetric structural information and heterogeneity between DAGs in order to allow for the identification of causal structure with observational data. In addition, by pooling information across subject-specific DAGs, we can identify causal structure with a high probability and estimate subject-specific networks with a high precision. We propose a penalized likelihood-based approach to handle multi-dimensionality of the DAG model. We propose a fast, iterative computational algorithm, DAG-MM, to estimate parameters in mSEM and achieve desirable sparsity by hard-thresholding the edges. We theoretically prove the identifiability of mSEM. Using simulations and an application to protein signaling data, we show substantially improved performances when compared to existing methods and consistent results with a network estimated from interventional data. Lastly, we identify gray matter atrophy networks in regions of brain from patients with Huntington's disease and corroborate our findings using white matter connectivity data collected from an independent study.

Highlights

Directed acyclic graphs (DAGs) are used to represent the causal mechanisms of a complex system of interacting components, such as biological cellular pathways (Sachs et al, 2005), gene regulatory networks (Ud-Dean et al, 2016), and brain connectivity networks (Friston, 2011)
The structural equation model (SEM) in (2) assumes that for each edge in the DAG, the causal effect is decomposed into a subject-specific fixed-effect term that depends on the exogenous covariates and a subject-specific random-effect term that captures residual heterogeneity in causal effects due to other latent factors beyond Xi
We focus on comparing DAG-MM2 with invariance causal prediction (ICP)

Summary

Introduction

Directed acyclic graphs (DAGs) are used to represent the causal mechanisms of a complex system of interacting components, such as biological cellular pathways (Sachs et al, 2005), gene regulatory networks (Ud-Dean et al, 2016), and brain connectivity networks (Friston, 2011). A limitation of the PC algorithm is that it does not provide the proper level of multiple comparison correction and may lead to a large number of false positives in practice. To remedy this limitation, a hybrid, two-stage approach was proposed (PenPC, Ha et al, 2016) that first estimates a sparse skeleton based on penalized regression and performs a modified PC-stable algorithm on the skeleton

Objectives

Methods

Findings

Conclusion