Abstract

BackgroundDetermining interacting SNPs in genome-wide association studies is computationally expensive yet of considerable interest in genomics. Findings We present a program Chi8 that calculates the Chi-square 8 degree of freedom test between all pairs of SNPs in a brute force manner on a Graphics Processing Unit. We analyze each of the seven WTCCC genome-wide association studies that have about 5000 total case and controls and 400,000 SNPs in an average of 9.6 h on a single GPU. We also study the power, false positives, and area under curve of our program on simulated data and provide a comparison to the GBOOST program. Our program source code is freely available from http://www.cs.njit.edu/usman/Chi8.Electronic supplementary materialThe online version of this article (doi:10.1186/s13104-015-1392-5) contains supplementary material, which is available to authorized users.

Highlights

  • Determining interacting SNPs in genome-wide association studies is computationally expensive yet of considerable interest in genomics

  • In the numeric format the genome-wide association study (GWAS) is given by an n by m matrix of characters taking on the values ‘0’, ‘1’, and ‘2’ where n is the number of subjects and m is the number of SNPs

  • We ran GBOOST with a screen threshold (BOOST interaction threshold) of 37. We obtained this value by starting from the default in the program and increasing it until the power was equal to previously published values on model 1 allele frequency of 0.1 [14]

Read more

Summary

Methods

Chi algorithm Our program, that we call Chi, computes the Chi-square 8-df test between all pairs of SNPs in a parallel. A GWAS is a matrix of SNPs where each SNP is given by a string of two letters each taking on the values A, C, G, and T. We convert each SNP into ‘0’, ‘1’, and ‘2’ to represent the number of copies of the allele with the larger alphabet value [15, 16]. In the numeric format the GWAS is given by an n by m matrix of characters taking on the values ‘0’, ‘1’, and ‘2’ where n is the number of subjects and m is the number of SNPs. We assume that all case subjects appear before controls in the GWAS.

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call