Abstract

BackgroundThe application of pathway and gene-set based analyses to high-throughput data is increasingly common and represents an effort to understand underlying biology where single-gene or single-marker analyses have failed. Many such analyses rely on the a priori identification of genes associated with the trait of interest. In contrast, this variance-component–based approach creates a similarity matrix of individuals based on the expression of genes in each pathway.MethodsWe compared 16 methods of calculating similarity for positive control matrices based on probes for the genes used to model the simulated Genetic Analysis Workshop phenotypes.ResultsA simple correlation matrix outperforms the other methods by identifying pathways associated with the simulated phenotypes at nearly twice the rate expected based on the associations of the component transcripts and an approximate false-positive rate of 0.05.ConclusionsThis method has a number of additional advantages compared to single-transcript and pathway overrepresentation analyses, including the ability to estimate the proportion of variation explained by each pathway and the logistical advantage of only calculating the distance matrices once for each messenger RNA data set regardless of the number of phenotypes. Additionally, it offers a significant reduction in the multiple testing burden over individual consideration of each probe.

Highlights

  • The application of pathway and gene-set based analyses to high-throughput data is increasingly common and represents an effort to understand underlying biology where single-gene or single-marker analyses have failed

  • We considered each similarity matrix separately as an additional variance component and applied a likelihood ratio test (LRT) to determine if the positive control pathway explains significantly more of the variation in the phenotype than kinship alone

  • The probes in the top decile are associated with Diastolic blood pressure (DBP) in an average of 59.1 simulations (29.6 %), whereas those in the bottom decile have an average of just 2.04 associations in the 200 simulations (1.0 %)

Read more

Summary

Introduction

The application of pathway and gene-set based analyses to high-throughput data is increasingly common and represents an effort to understand underlying biology where single-gene or single-marker analyses have failed. Many such analyses rely on the a priori identification of genes associated with the trait of interest. Pathway and gene-set enrichment analyses were developed with several goals, including increasing the biological interpretability of genetic association and RNA expression analyses [1] Because these pathway tests are based on the results of gene- or probe-based prior analyses, they rely on aggregation of individual effects. This has the advantage of implicitly aggregating across effects of individual probes in the pathway, thereby allowing the pathway to become the level of

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call