Abstract
BackgroundDetermining interacting SNPs in genome-wide association studies is computationally expensive yet of considerable interest in genomics. Findings We present a program Chi8 that calculates the Chi-square 8 degree of freedom test between all pairs of SNPs in a brute force manner on a Graphics Processing Unit. We analyze each of the seven WTCCC genome-wide association studies that have about 5000 total case and controls and 400,000 SNPs in an average of 9.6 h on a single GPU. We also study the power, false positives, and area under curve of our program on simulated data and provide a comparison to the GBOOST program. Our program source code is freely available from http://www.cs.njit.edu/usman/Chi8.Electronic supplementary materialThe online version of this article (doi:10.1186/s13104-015-1392-5) contains supplementary material, which is available to authorized users.
Highlights
Determining interacting SNPs in genome-wide association studies is computationally expensive yet of considerable interest in genomics
In the numeric format the genome-wide association study (GWAS) is given by an n by m matrix of characters taking on the values ‘0’, ‘1’, and ‘2’ where n is the number of subjects and m is the number of SNPs
We ran GBOOST with a screen threshold (BOOST interaction threshold) of 37. We obtained this value by starting from the default in the program and increasing it until the power was equal to previously published values on model 1 allele frequency of 0.1 [14]
Summary
Chi algorithm Our program, that we call Chi, computes the Chi-square 8-df test between all pairs of SNPs in a parallel. A GWAS is a matrix of SNPs where each SNP is given by a string of two letters each taking on the values A, C, G, and T. We convert each SNP into ‘0’, ‘1’, and ‘2’ to represent the number of copies of the allele with the larger alphabet value [15, 16]. In the numeric format the GWAS is given by an n by m matrix of characters taking on the values ‘0’, ‘1’, and ‘2’ where n is the number of subjects and m is the number of SNPs. We assume that all case subjects appear before controls in the GWAS.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have