Abstract 2473: Scaling computational genomics to millions of individuals with GPUs

Amaro N Taylor-Weiner,Sager Gosai,Eliezer M Van Allen,Gad Getz,Jaegil Kim,Nicholas Haradhvala,François Aguet,Kristin Ardlie

doi:10.1158/1538-7445.am2019-2473

Abstract

Abstract Background: Current genomics methods and pipelines were designed to handle tens to thousands of samples, but will soon need to scale to millions to keep up with the pace of data and hypothesis generation in biomedical science. The computational costs associated with processing these growing datasets will become prohibitive without improving the computational efficiency and scalability of methods. For example, methods in population genetics, such as genome-wide association studies (GWAS) or mapping of quantitative trait loci (QTL), involve billions of regressions between genotypes and phenotypes. Currently, the state-of-the-art infrastructure for performing these tasks are large-scale clusters of central processing units (CPUs), often with thousands of cores that result in significant cost (960 cores on a standard Google Cloud machine costs $7,660.80 per day of compute). In contrast to CPUs, which feature relatively few cores (i.e. Intel's i9 has 6 cores), a single graphical processing unit (GPU) contains thousands of cores (i.e. Nvidia's P100 has 3,584 cores). Here, we show that implementation of genomics methods using recently developed machine-learning libraries for GPUs will significantly accelerate computations and enable scaling to hundreds of thousands of samples. Results: To demonstrate this and benchmark the use of machine-learning libraries for large-scale genomic analyses, we re-implemented methods for two commonly performed computational genomics tasks: (i) QTL mapping (tensorQTL) and Bayesian non-negative matrix factorization (SignatureAnalyzer-GPU). To benchmark tensorQTL, we generated random data representing up to 50,000 people (with 107 variants) resulting in 500x109 all-against-all association tests. Our implementation enabled cis-QTL mapping &gt;250x times faster than the current state-of-the-art implementation (FastQTL). Likewise, trans-QTL mapping (i.e., 500 billion regressions) took less than 10 minutes, a ~200x increase in speed compared to running without a GPU. To benchmark SignatureAnalyzer-GPU (SA-GPU), we used the mutation counts matrix generated by the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which contains 2,624 tumors represented by 1697 mutational features of somatic single nucleotide variants and short insertions and deletions (defined based on their sequence contexts). Our GPU implementation ran approximately 200 times faster than the current implementation of SignatureAnalyzer (SA) in R with mean times for 10,000 iterations of 194.8 min using SA vs. 1.09 min using SA-GPU. Conclusion: We anticipate that the accessibility of these libraries (e.g., TensorFlow, PyTorch), and the improvements in run-time will lead to a transition to GPU-based implementations for a wide range of computational genomics methods. Citation Format: Amaro N. Taylor-Weiner, François Aguet, Nicholas Haradhvala, Sager Gosai, Jaegil Kim, Kristin Ardlie, Eliezer M. Van Allen, Gad Getz. Scaling computational genomics to millions of individuals with GPUs [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 2473.

Full Text