Abstract

BackgroundsRecent large-scale genetic studies often involve clustered phenotypes such as repeated measurements. Compared to a series of univariate analyses of single phenotypes, an analysis of clustered phenotypes can be useful for substantially increasing statistical power to detect more genetic associations. Moreover, for the analysis of rare variants, incorporation of biological information can boost weak effects of the rare variants.ResultsThrough simulation studies, we showed that the proposed method outperforms other method currently available for pathway-level analysis of clustered phenotypes. Moreover, a real data analysis using a large-scale whole exome sequencing dataset of 995 samples with metabolic syndrome-related phenotypes successfully identified the glyoxylate and dicarboxylate metabolism pathway that could not be identified by the univariate analyses of single phenotypes and other existing method.ConclusionIn this paper, we introduced a novel pathway-level association test by combining hierarchical structured components analysis and penalized generalized estimating equations. The proposed method analyzes all pathways in a single unified model while considering their correlations. C/C++ implementation of PHARAOH-GEE is publicly available at http://statgen.snu.ac.kr/software/pharaoh-gee/.

Highlights

  • MethodsThe proposed method is an extension of the doubly-regularized Generalized Structured Component Analysis into the Generalized Estimating Equations (GEE) framework [27] that imposes ridge penalties [28] on both gene-pathway and pathway-phenotype relationships

  • In order to address this problem, we propose a novel pathway-level association test for clustered and correlated phenotypes such as repeated measurements, Pathway-based approach using HierArchical component of collapsed RAre variants Of High-throughput sequencing data using Generalized Estimating Equations (PHARAOH-GEE)

  • The proposed PHARAOH-GEE method was applied to the 10 pathways simultaneously, whereas GEEaSPU was applied to each pathway individually

Read more

Summary

Methods

The proposed method is an extension of the doubly-regularized Generalized Structured Component Analysis into the GEE framework [27] that imposes ridge penalties [28] on both gene-pathway and pathway-phenotype relationships. We successfully demonstrated that those two ridge penalties effectively control the correlations between genes and pathways [9, 10]. PHARAOH-GEE aims to identify associations between Q clustered phenotypes and K pathways, each of which is linked to Tk genes (k = 1, ⋯, K). Similar to the previous description of the PHARAOH model [9], we assume that yiq follows an exponential family distribution with a mean μiq. Let Σi be the Q × Q covariance matrix of ~yi.

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call