Abstract
Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.
Highlights
Large scale population based genome-wide association studies (GWAS) of complex disease have been highly effective at elucidating hereditary risk factors
After installing the four programs described above on a 64 bit Linux host equipped with two nVidia Tesla K20 Graphics Processing Units (GPU), we initially executed each program using included sample data to verify compatibility
Both methods display comparable power at this effect size, epiGPU appears to carry a slight advantage at the lowest false positive rates
Summary
Large scale population based genome-wide association studies (GWAS) of complex disease have been highly effective at elucidating hereditary risk factors. Statistical geneticists have long been interested in developing methods to detect epistasis (Cordell, 2002; Huang et al, 2013). Tackling this challenge requires a multi-prong strategy of advances in statistical methodology, clever optimization algorithms (e.g., those requiring less iterations to converge to a solution), and efficient implementations that extract the full potential of state of the art many-core processors. There is a rich array of statistical methods designed to search for epistasis, only a few have been developed into code that can make use of GPUs. We encourage readers who are interested in applying or developing state of the art methods to consider implementation strategies that make use of the parallelism in modern processors. This tutorial revolves around a short working example that inclined readers can use to extend toward more interesting problems
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.