Abstract

In molecular biology, advances in high-throughput technologies have made it possible to study complex multivariate phenotypes and their simultaneous associations with high-dimensional genomic and other omics data, a problem that can be studied with high-dimensional multi-response regression, where the response variables are potentially highly correlated. To this purpose, we recently introduced several multivariate Bayesian variable and covariance selection models, e.g., Bayesian estimation methods for sparse seemingly unrelated regression for variable and covariance selection. Several variable selection priors have been implemented in this context, in particular the hotspot detection prior for latent variable inclusion indicators, which results in sparse variable selection for associations between predictors and multiple phenotypes. We also propose an alternative, which uses a Markov random field (MRF) prior for incorporating prior knowledge about the dependence structure of the inclusion indicators. Inference of Bayesian seemingly unrelated regression (SUR) by Markov chain Monte Carlo methods is made computationally feasible by factorisation of the covariance matrix amongst the response variables. In this paper we present BayesSUR, an R package, which allows the user to easily specify and run a range of different Bayesian SUR models, which have been implemented in C++ for computational efficiency. The R package allows the specification of the models in a modular way, where the user chooses the priors for variable selection and for covariance selection separately. We demonstrate the performance of sparse SUR models with the hotspot prior and spike-and-slab MRF prior on synthetic and real data sets representing eQTL or mQTL studies and in vitro anti-cancer drug screening studies as examples for typical applications.

Highlights

  • With the development of high-throughput technologies in molecular biology, the large-scale molecular characterization of biological samples has become common-place, for example by genome-wide measurement of gene expression, single nucleotide polymorphisms (SNP) or CpG methylation status

  • Our software package BayesSUR (Banterle, Zhao, and Lewin 2021) gathers together several models that we have proposed for high-dimensional regression of multiple responses and introduces a novel model, allowing for different priors for variable selection in the regression models and for different assumptions about the dependence structure between responses

  • The BayesSUR package presents a series of multivariate Bayesian variable selection models, for which the evolutionary stochastic search (ESS) algorithm is employed for posterior inference over the model space

Read more

Summary

Introduction

With the development of high-throughput technologies in molecular biology, the large-scale molecular characterization of biological samples has become common-place, for example by genome-wide measurement of gene expression, single nucleotide polymorphisms (SNP) or CpG methylation status. Bottolo et al (2011) and Lewin et al (2015b) further proposed the hotspot prior for variable selection in multivariate regression, in which the probability of association between the predictors and responses is decomposed multiplicatively into predictor and response random effects This prior is implemented in a multivariate Bayesian hierarchical regression setup in the software R2HESS (Lewin, Campanella, Saadi, Liquet, and Chadeau-Hyam 2015a), available from https://www.mrc-bsu.cam.ac.uk/software/. The BayesSUR package implements many of these possible choices for high-dimensional multiresponse regressions by allowing the user to choose among three different prior structures for the residual covariance matrix and among three priors for the joint distribution of the variable selection indicators This includes a novel model setup, where the MRF prior for incorporating prior knowledge about the dependence structure of the inclusion indicators is combined with Bayesian SUR models (Zhao, Banterle, Lewin, and Zucknick 2021).

Models specification
MCMC sampler and posterior inference
The R package BayesSUR
Quick start with a simple example
Two extended examples based on real data
Simulated eQTL data
The genomics of drug sensitivity in cancer data
Conclusion
The elpd
Posterior predictive for the HRR model
Posterior predictive for the dSUR and SSUR models
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call