Abstract

Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA).

Highlights

  • In biomedical research, complex biological systems are often modeled or represented as biological networks (Kitano, 2002)

  • To address limitations of existing methods, we propose a new prior model: the thresholded graph Laplacian Gaussian (TGLG) prior, to perform network marker selection over the large-scale network by thresholding a latent continuous variable attached to each node

  • Following settings in Li and Li (2008), Zhe et al (2013) and Kim et al (2013), we simulate small simple gene networks consisting of multiple subnetworks, where each subnetwork contains one transcription factor (TF) gene and 10 target genes that are connected to the TF gene; and two of the subnetworks are set as the true network markers

Read more

Summary

Introduction

Complex biological systems are often modeled or represented as biological networks (Kitano, 2002). To address limitations of existing methods, we propose a new prior model: the thresholded graph Laplacian Gaussian (TGLG) prior, to perform network marker selection over the large-scale network by thresholding a latent continuous variable attached to each node. We propose to build the threshold priors using the graph Laplacian matrix, which has been used to capture the structure dependence between neighboring nodes (Li and Li, 2008; Zhe et al, 2013; Li and Li, 2010) Most of those frequentist methods directly specify the graph Laplacian matrix from the existing biological network.

The Model
Theoretical Properties
Posterior Computation
Small Simple Networks
Method PMSE TP
Large Scale-Free Networks
Application to Breast Cancer Data from the Cancer Genome Atlas
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call