Abstract

Genomic datasets are steadily growing in size as more genomes are sequenced and new genetic variants are discovered. Datasets that comprise thousands of genomes and millions of single-nucleotide polymorphisms (SNPs), exhibit excessive computational demands that can lead to prohibitively long analyses, yielding the deployment of high-performance computational approaches a prerequisite for the thorough analysis of current and future large-scale datasets. In this work, we demonstrate that the computational kernel for calculating linkage disequilibria (LD) in genomes, i.e., the non-random associations between alleles at different loci, can be cast in terms of dense linear algebra (DLA) operations, leveraging the collective knowledge in the DLA community in developing high-performance implementations for various microprocessor architectures. The proposed approach for computing LD achieves between 84% and 95% of the theoretical peak performance of the machine, and is up to 17X faster than existing LD kernel implementations. Furthermore, we argue that, the current trend of increasing the SIMD (Single Instruction Multiple Data) register width in microprocessors yields minor benefits for assessing LD, resulting in an increasing gap between performance attainable by LD computations and the theoretical peak of the microprocessor architecture, suggesting the need for hardware support.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.