Padding free bank conflict resolution for CUDA-based matrix transpose algorithm

A Khan,M Al-Mouhamed,M Assayony,A Fatayar,A Almousa,A Baqais

doi:10.1109/snpd.2014.6888709

Abstract

Matrix Transposition is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose on Graphic Processing Units (GPUs). The degradation in performance involves the memory access pattern such as coalesced access in the global memory and bank conflict in the shared memory of streaming multiprocessors within the GPU. In this paper, two matrix transpose algorithms are proposed to alleviate the aforementioned issues of ensuring coalesced access and conflict free bank access. The proposed algorithms have comparable execution times with the NVIDIA SDK bank conflict - free matrix transpose implementation. The main advantage of proposed algorithms is that they eliminate bank conflicts while allocating shared memory exactly equal to the tile size (T × T) of the problem space. However, to the best of our knowledge an extra space of Tx(T +1) needs to be allocated in the published research. We have also applied the proposed transpose algorithm to recursive Gaussian implementation of NVIDIA SDK and achieved about 6% improvement in performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Padding free bank conflict resolution for CUDA-based matrix transpose algorithm

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm
Abdulrahman Baqais ... Mohammed Assayony
International Journal of Networked and Distributed Computing | VOL. 2
Abdulrahman Baqais, et. al.Abdulrahman Baqais ... Mohammed Assayony
01 Jan 2014
International Journal of Networked and Distributed Computing | VOL. 2

Towards Algebraic Modeling of GPU Memory Access for Bank Conflict Mitigation
Luca Ferranti ... Jani Boutellier
-
Luca Ferranti, et. al.Luca Ferranti ... Jani Boutellier
01 Oct 2019
01 Oct 2019

Ballooning Graphics Memory Space in Full GPU Virtualization Environments
Younghun Park ... Sungyong Park
Scientific Programming | VOL. 2019
Younghun Park, et. al.Younghun Park ... Sungyong Park
23 Apr 2019
Scientific Programming | VOL. 2019

Efficient Batched Predecessor Search in Shared Memory on GPUs
Ben Karsin ... Nodari Sitchinava
-
Ben Karsin, et. al.Ben Karsin ... Nodari Sitchinava
01 Dec 2015
01 Dec 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Padding free bank conflict resolution for CUDA-based matrix transpose algorithm

Abstract

Talk to us

Similar Papers