Efficient Computation of Linkage Disequilibria as Dense Linear Algebra Operations

Nikolaos Alachiotis,Thom Popovici,Tze Meng Low

doi:10.1109/ipdpsw.2016.80

Abstract

Genomic datasets are steadily growing in size as more genomes are sequenced and new genetic variants are discovered. Datasets that comprise thousands of genomes and millions of single-nucleotide polymorphisms (SNPs), exhibit excessive computational demands that can lead to prohibitively long analyses, yielding the deployment of high-performance computational approaches a prerequisite for the thorough analysis of current and future large-scale datasets. In this work, we demonstrate that the computational kernel for calculating linkage disequilibria (LD) in genomes, i.e., the non-random associations between alleles at different loci, can be cast in terms of dense linear algebra (DLA) operations, leveraging the collective knowledge in the DLA community in developing high-performance implementations for various microprocessor architectures. The proposed approach for computing LD achieves between 84% and 95% of the theoretical peak performance of the machine, and is up to 17X faster than existing LD kernel implementations. Furthermore, we argue that, the current trend of increasing the SIMD (Single Instruction Multiple Data) register width in microprocessors yields minor benefits for assessing LD, resulting in an increasing gap between performance attainable by LD computations and the theoretical peak of the microprocessor architecture, suggesting the need for hardware support.

Full Text