Abstract

New high-throughput sequencing technologies have brought forth opportunities for unbiased analysis of thousands of rare genomic variants in genome-wide association studies of complex diseases. Because it is hard to detect single rare variants with appreciable effect sizes at the population level, existing methods mostly aggregate effects of multiple markers by collapsing the rare variants in genes (or genomic regions). We hypothesize that a higher level of aggregation can further improve association signal strength. Using the Genetic Analysis Workshop 17 simulated data, we test a two-step strategy that first applies a collapsing method in a gene-level analysis and then aggregates the gene-level test results by performing an enrichment analysis in gene sets. We find that the gene set approach which combines signals across multiple genes outperforms testing individual genes separately and that the power of the gene set enrichment test is further improved by proper adjustment of statistics to account for gene-wise differences.

Highlights

  • It is debatable whether individual common single-nucleotide polymorphism (SNPs) or multiple rare variants underlie common human diseases, the truth is probably somewhere in between in that both types of variants might be important

  • We previously developed an extension of Gene set enrichment analysis (GSEA), called variable set enrichment analysis (VSEA), for improved enrichment analysis in genome-wide association studies (GWAS) [5]

  • We tested an improved method for enrichment analysis on top of gene-based collapsing methods for combining association signals across multiple genes

Read more

Summary

Background

It is debatable whether individual common single-nucleotide polymorphism (SNPs) or multiple rare variants underlie common human diseases, the truth is probably somewhere in between in that both types of variants might be important. Popular collapsing methods include the combined multivariate and collapsing (CMC) method [1] and the weighted-sum test [2] These methods combine the effects of many SNPs in a chromosome region (e.g., vicinity of a candidate gene) to enhance statistical power. One of the main issues addressed by our VSEA extension is how to properly normalize the statistics for aggregation to compensate for their distributional differences that result from various gene or gene set sizes or complicated interactions. This issue becomes more important when both rare and common SNPs are included in the analysis. This two-step approach is applied to the Genetic Analysis Workshop 17 (GAW17) simulated data with knowledge of the underlying simulating model; we test both the CMC and the weighted-sum tests for genelevel analysis and both VSEA and GSEA for gene-setlevel enrichment analysis

Methods
Results
Discussion and conclusions
Method
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call