Abstract

Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification. Here, a statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly. Implicitly adjusting for probe correlations, data structure (cell-count or relatedness), and single-nucleotide polymorphism (SNP) marker effects, improves association estimates and in 9,448 individuals, 75.7% (95% CI 71.70–79.3) of body mass index (BMI) variation and 45.6% (95% CI 37.3–51.9) of cigarette consumption variation was captured by whole blood methylation array data. Pathway-linked probes of blood cholesterol, lipid transport and sterol metabolism for BMI, and xenobiotic stimuli response for smoking, showed >1.5 times larger associations with >95% posterior inclusion probability. Prediction accuracy improved by 28.7% for BMI and 10.2% for smoking over a LASSO model, with age-, and tissue-specificity, implying associations are a phenotypic consequence rather than causal.

Highlights

  • Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification

  • Our approach assumes that the observed phenotype y is reflected by a linear combination of genetic effects estimated from single-nucleotide polymorphism (SNP) data, epigenetic effects estimated from probes on an array, along with age and sex specific effects (α, γ), such that: y 1⁄4 αage þ γsex þ Xcpg βcpg þ XGβG þ ε ð1Þ

  • Unlike previous approaches that assign a mixture of Gaussian distributions and a discrete spike at zero as a prior for all effects, we assign a new set of mixtures to each group, effectively augmenting the number of hyperparameters proportionally to the number of groups of variables

Read more

Summary

Results

Unlike previous approaches that assign a mixture of Gaussian distributions and a discrete spike at zero as a prior for all effects, we assign a new set of mixtures to each group, effectively augmenting the number of hyperparameters proportionally to the number of groups of variables (two in the case of SNPs and methylation probes) This allows for non-identifiable effects to be excluded from the model, whereas the rest are estimated jointly. We provide an example of how the model can be extended to allow probe effects to be estimated accounting for genomic annotation information, providing estimates of genomic enrichment that do not reply on post hoc testing This flexibility is important as data sets will likely be variable in their structure and the degree to which different “omics” measures are correlated. We conduct additional simulations with a wide a range of settings, varying the cell-type

Methods
10 BMI Smoking
Discussion
Method
C1 3 σ σ266664
Code availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.