Abstract

BackgroundThe Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive.ResultsWe developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations.ConclusionkruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com.

Highlights

  • The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive

  • A new approach (“matrix-eQTL”) was developed which uses the fact that the test statistics for the additive linear regression and ANOVA models can be expressed as multiplications between rescaled genotype and expression data matrices, thereby realising a dramatic speed-up compared to traditional QTL-mapping algorithms [6]

  • We describe an application of kruX on a human dataset of 19,610 genes and 530,222 single-nucleotide polymorphisms (SNPs) markers measured in 102 whole blood samples from the Stockholm Atherosclerosis Gene Expression (STAGE) study [13]

Read more

Summary

Results

We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. Conclusion: kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com

Background
Results and discussion
10 Linear
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call