Abstract

Graphical models are widely used to model stochastic dependences among large collections of variables. We introduce a new method of estimating undirected conditional independence graphs based on the score matching loss, introduced by Hyvärinen (2005), and subsequently extended in Hyvärinen (2007). The regularized score matching method we propose applies to settings with continuous observations and allows for computationally efficient treatment of possibly non-Gaussian exponential family models. In the well-explored Gaussian setting, regularized score matching avoids issues of asymmetry that arise when applying the technique of neighborhood selection, and compared to existing methods that directly yield symmetric estimates, the score matching approach has the advantage that the considered loss is quadratic and gives piecewise linear solution paths under ℓ1 regularization. Under suitable irrepresentability conditions, we show that ℓ1-regularized score matching is consistent for graph estimation in sparse high-dimensional settings. Through numerical experiments and an application to RNAseq data, we confirm that regularized score matching achieves state-of-the-art performance in the Gaussian case and provides a valuable tool for computationally efficient estimation in non-Gaussian graphical models.

Highlights

  • Undirected graphical models, known as Markov random fields, are important tools for summarizing dependency relationships between random variables and have found application in many fields, including bioinformatics, language and speech processing, and digital communications

  • Other possibilities include generalized cross validation (GCV) (Tibshirani, 1996), Akaike’s Information Criterion (AIC), approaches based on stability under resampling (Meinshausen and Buhlmann, 2010; Shah and Samworth, 2013; Liu, Roeder and Wasserman, 2010), the Bayesian Information Criterion (BIC) (Schwarz, 1978) as well as extensions of BIC proposed to cope with large model spaces (Chen and Chen, 2008; Gao et al, 2012; Foygel and Drton, 2010b; Barber and Drton, 2015)

  • This paper proposes the use of regularized score matching for estimation of conditional independence graphs in high dimensions

Read more

Summary

Introduction

Undirected graphical models, known as Markov random fields, are important tools for summarizing dependency relationships between random variables and have found application in many fields, including bioinformatics, language and speech processing, and digital communications. Addressing the case of continuous but not necessarily Gaussian observations, the proposed method is based on the score matching loss, first introduced by Hyvarinen (2005) in the setting of image analysis. As we demonstrate for Gaussian graphical models, regularized score matching exhibits state-of-the-art statistical efficiency in high-dimensional settings. In the Gaussian setting, regularized score matching is structurally closest to pseudo-likelihood methods with symmetry constraints, such as SPACE (Peng et al, 2009), symmetric lasso (Friedman, Hastie and Tibshirani, 2010) and SPLICE (Rocha, Zhao and Yu, 2008). We explore regularization of the non-negative score matching loss as a tool for estimation of conditional independence graphs from high-dimensional nonnegative data, and we establish consistency of the method.

Score matching
Basic score matching
Extension to non-negative data
Score matching in exponential families
Pairwise interaction models
Regularized score matching
Methodology
Uniqueness of rSME
Piecewise linear paths
Tuning
Numerical experiments
Gaussian data
Non-negative Gaussian data
Normal conditionals
A robustness check
Contaminated Gaussians
Multivariate t-distributed observations
Application to RNAseq data
Theory
Setup and notation
Irrepresentability
Main results
Proof of Theorem 1
Proof of Corollary 1
Proof of Corollary 2
Discussion
2: Initialize
Findings
Gaussian experiments

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.