Abstract

In high-throughput genetics studies, an important aim is to identify gene–environment interactions associated with the clinical outcomes. Recently, multiple marginal penalization methods have been developed and shown to be effective in G×E studies. However, within the Bayesian framework, marginal variable selection has not received much attention. In this study, we propose a novel marginal Bayesian variable selection method for G×E studies. In particular, our marginal Bayesian method is robust to data contamination and outliers in the outcome variables. With the incorporation of spike-and-slab priors, we have implemented the Gibbs sampler based on Markov Chain Monte Carlo (MCMC). The proposed method outperforms a number of alternatives in extensive simulation studies. The utility of the marginal robust Bayesian variable selection method has been further demonstrated in the case studies using data from the Nurse Health Study (NHS). Some of the identified main and interaction effects from the real data analysis have important biological implications.

Highlights

  • The risk and progression of complex diseases including cancer, asthma and type 2 diabetes are associated with the coordinated functioning of genetic factors, the environmental factors, as well as their interactions (Hunter, 2005; Von Mutius, 2009; Cornelis and Hu, 2012; Simonds et al, 2016)

  • We consider single-nucleotide polymorphism (SNP) on chromosome 10 to identify main and gene– environment interactions associated with weight, which is an important phenotypic trait related to type 2 diabetes

  • G×E interaction studies have been mainly conducted through marginal hypothesis testing, based on a diversity of study designs utilizing parametric, nonparametric, and semiparametric models (Murcray et al, 2009; Thomas, 2010; Mukherjee et al, 2012), which later have been extended to joint analyses driven primarily by the pathway or gene set based association studies (Wu and Cui, 2013a; Jin et al, 2014; Jiang et al, 2017)

Read more

Summary

INTRODUCTION

The risk and progression of complex diseases including cancer, asthma and type 2 diabetes are associated with the coordinated functioning of genetic factors, the environmental (and clinical) factors, as well as their interactions (Hunter, 2005; Von Mutius, 2009; Cornelis and Hu, 2012; Simonds et al, 2016). Zhang et al (2020) has constructed 95% confidence intervals based on the observed occurrence index (OOI) values (Huang and Ma, 2010); this measure has been used to demonstrate stability of identified effects rather than quantifying uncertainty of penalized estimates. These limitations have motivated us to consider Bayesian analyses. Ren et al (2020) has further incorporated selection of linear and nonlinear G×E interactions simultaneously while accounting for structured identification in the Bayesian adaptive shrinkage framework All these fully Bayesian methods can efficiently provide uncertainty quantification based on the posterior samples from MCMC. The findings from the case study of the Nurses’ Health Study (NHS) with SNP measurements have important biological implications

METHOD
Bayesian Formulation of the LAD Regression
Bayesian LAD LASSO With
SIMULATION
REAL DATA ANALYSIS
DISCUSSION
DATA AVAILABILITY STATEMENT
ETHICS STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call