Abstract

MotivationGene–environment (GxE) interactions are one of the least studied aspects of the genetic architecture of human traits and diseases. The environment of an individual is inherently high dimensional, evolves through time and can be expensive and time consuming to measure. The UK Biobank study, with all 500 000 participants having undergone an extensive baseline questionnaire, represents a unique opportunity to assess GxE heritability for many traits and diseases in a well powered setting.ResultsWe have developed a randomized Haseman–Elston non-linear regression method applicable when many environmental variables have been measured on each individual. The method (GPLEMMA) simultaneously estimates a linear environmental score (ES) and its GxE heritability. We compare the method via simulation to a whole-genome regression approach (LEMMA) for estimating GxE heritability. We show that GPLEMMA is more computationally efficient than LEMMA on large datasets, and produces results highly correlated with those from LEMMA when applied to simulated data and real data from the UK Biobank.Availability and implementationSoftware implementing the GPLEMMA method is available from https://jmarchini.org/gplemma/.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • The advent of genome-wide association studies[1] has catalyzed a huge number of discoveries linking genetic markers to many human complex diseases and traits

  • We compared to running the whole genome regression model in LEMMA, which estimates an environmental score (ES) and uses it to estimate the GxE heritability

  • This paper develops a novel randomized Haseman-Elston non-linear regression approach for modelling GxE interactions in large genetic studies with multiple environmental variables. This approach estimates GxE heritability at the same time as estimating the linear combination of environmental variables that underly that heritability. This general idea was pioneered in our previous approach LEMMA 29 which used a whole-genome regression approach to learn the ES, and this was used in a randomized Haseman-Elston approach to estimate GxE heritability

Read more

Summary

Introduction

The advent of genome-wide association studies[1] has catalyzed a huge number of discoveries linking genetic markers to many human complex diseases and traits. For the most part these discoveries have involved common variants that confer relatively small amounts of risk and only account for a small proportion of the phenotypic variance of a trait[2] This has led to a surge of interest in methods and applications that measure the joint contribution to phenotypic variance of all measured variants throughout the genome (SNP heritability), and in testing individual variants within this framework. When testing variants for association these LMMs can reduce false positive associations due to population structure, and improve power by implicitly conditioning on other loci across the genome[4,5,6] These methods model the unobserved polygenic contribution as a multivariate Gaussian with covariance structure proportional to a genetic relationship matrix (GRM) 7–9. This approach is mathematically equivalent to a whole genome regression (WGR) model with a Gaussian prior over SNP effects 4

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call