Abstract

We consider the problem of jointly estimating multiple related directed acyclic graph (DAG) models based on high-dimensional data from each graph. This problem is motivated by the task of learning gene regulatory networks based on gene expression data from different tissues, developmental stages or disease states. We prove that under certain regularity conditions, the proposed $\ell _{0}$-penalized maximum likelihood estimator converges in Frobenius norm to the adjacency matrices consistent with the data-generating distributions and has the correct sparsity. In particular, we show that this joint estimation procedure leads to a faster convergence rate than estimating each DAG model separately. As a corollary, we also obtain high-dimensional consistency results for causal inference from a mix of observational and interventional data. For practical purposes, we propose jointGES consisting of Greedy Equivalence Search (GES) to estimate the union of all DAG models followed by variable selection using lasso to obtain the different DAGs, and we analyze its consistency guarantees. The proposed method is illustrated through an analysis of simulated data as well as epithelial ovarian cancer gene expression data.

Highlights

  • Methods for structure identification in directed graphical models can be divided into two categories and hybrids of these categories

  • In this paper we presented jointGES, an algorithm for the joint estimation of multiple related directed acyclic graph (DAG) models from independent realizations

  • Joint estimation is of particular interest in applications where data is collected not from a single DAG, but rather multiple related DAGs, such as gene expression data from different tissues, cell types or from different interventional experiments

Read more

Summary

Introduction

Methods for structure identification in directed graphical models can be divided into two categories and hybrids of these categories. One would expect that the underlying regulatory networks are similar to each other, since they stem from the same species, individual or cell type, and have important differences that drive differentiation, development or a certain disease This raises an important statistical question, namely how to jointly estimate related directed graphical models in order to effectively make use of the available data. Our theoretical consistency guarantees explain the empirical findings of [17], namely that estimating a DAG model from interventional data usually leads to better recovery rates as compared to estimating a DAG model from the same amount of purely observational data These theoretical results are based on the global optimum of 0-penalized maximum likelihood estimation. We analyze its properties from a theoretical point of view and test its performance on synthetic data and gene expression data from epithelial ovarian cancer

Objectives
Methods
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.