Abstract

BackgroundEnvironmental exposures can regulate intermediate molecular phenotypes, such as gene expression, by different mechanisms and thereby lead to various health outcomes. It is of significant scientific interest to unravel the role of potentially high-dimensional intermediate phenotypes in the relationship between environmental exposure and traits. Mediation analysis is an important tool for investigating such relationships. However, it has mainly focused on low-dimensional settings, and there is a lack of a good measure of the total mediation effect. Here, we extend an R-squared (R^2) effect size measure, originally proposed in the single-mediator setting, to the moderate- and high-dimensional mediator settings in the mixed model framework.ResultsBased on extensive simulations, we compare our measure and estimation procedure with several frequently used mediation measures, including product, proportion, and ratio measures. Our R^2-based second-moment measure has small bias and variance under the correctly specified model. To mitigate potential bias induced by non-mediators, we examine two variable selection procedures, i.e., iterative sure independence screening and false discovery rate control, to exclude the non-mediators. We establish the consistency of the proposed estimation procedures and introduce a resampling-based confidence interval. By applying the proposed estimation procedure, we found that 38% of the age-related variations in systolic blood pressure can be explained by gene expression profiles in the Framingham Heart Study of 1711 individuals. An R package “RsqMed” is available on CRAN.ConclusionR-squared (R^2) is an effective and efficient measure for total mediation effect especially under high-dimensional setting.

Highlights

  • Environmental exposures can regulate intermediate molecular phenotypes, such as gene expression, by different mechanisms and thereby lead to various health outcomes

  • The set of true mediators is denoted as M, the set of variables associated with exposure but not with outcome is denoted as M(1), and the set of variables associated with outcome but not the exposure is denoted as M(2)

  • Under low-dimensional setting, we found that mixed-effect models had a slightly better performance in estimating RM2 ediated and the shared over simple effect (SOS) as defined in Methods, compared with fixed-effect models; fixed-effect models had a better performance in estimating the product, proportion, and ratio measure (Additional file 1: Table S1)

Read more

Summary

Introduction

Environmental exposures can regulate intermediate molecular phenotypes, such as gene expression, by different mechanisms and thereby lead to various health outcomes. Mediation analysis is an important tool for investigating such relationships It has mainly focused on low-dimensional settings, and there is a lack of a good measure of the total mediation effect. Understanding the relationships between an environmental risk factor and health traits through molecular phenotypes, such as gene expression (GE) and DNA methylation, can provide mechanistic insights into disease etiology and exposure biology. There is growing evidence that many of these intermediate phenotypes could lie in the pathway between environmental exposure and downstream health outcomes [1, 2] It is of great scientific interest regarding how to measure the overall contribution of different types of molecular phenotypes in the pathways from an environmental risk factor to a phenotype endpoint. Mediation analysis is a natural approach to explore such relationships, which can help researchers delineate why and how two variables (dependent variable and independent variable) are related [3]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call