Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm

Abdulrahman Baqais,Allam Fatayer,Anas Almousa,Ayaz Ul Hassan Khan Khan,Mayez Al-Mouhamed,Mohammed Assayony

doi:10.2991/ijndc.2014.2.3.2

Abdulrahman Baqais, Allam Fatayer + Show 4 more

Open Access

https://doi.org/10.2991/ijndc.2014.2.3.2

Copy DOI

Abstract

The advances of Graphic Processing Units (GPU) technology and the introduction of CUDA programming model facilitates developing new solutions for sparse and dense linear algebra solvers. Matrix Transpose is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose on GPU devices. The degradation in performance involves the memory access pattern such as coalesced access in the global memory and bank conflict in the shared memory of streaming multiprocessors within the GPU. In this paper, two matrix transpose algorithms are proposed to alleviate the aforementioned issues of ensuring coalesced access and conflict free bank access. The proposed algorithms have comparable execution times with the NVIDIA SDK bank conflict free matrix transpose implementation. The main advantage of proposed algorithms is that they eliminate bank conflicts while allocating shared memory exactly equal to the tile size (T x T) of the problem space. However, to the best of our knowledge an extra space of Tx(T+1) needs to be allocated in the published research. We have also applied the proposed transpose algorithm to recursive gaussian implementation of NVIDIA SDK and achieved about 6% improvement in performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Networked and Distributed Computing	Publication Date: Jan 1, 2014
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm

Abstract

Talk to us

Similar Papers

More From: International Journal of Networked and Distributed Computing

Lead the way for us

Similar Papers

Padding free bank conflict resolution for CUDA-based matrix transpose algorithm
A Khan ... A Baqais
-
A Khan, et. al.A Khan ... A Baqais
01 Jun 2014
01 Jun 2014

The implementation of the three-dimensional unified gas-kinetic wave-particle method on multiple graphics processing units
Guochao Fan ... Shaobo Yao
Physics of Fluids | VOL. 35
Guochao Fan, et. al.Guochao Fan ... Shaobo Yao
01 Aug 2023
Physics of Fluids | VOL. 35

A High Throughput Efficient Approach for Decoding LDPC Codes onto GPU Devices
Bertrand Le Gal ... Christophe Jego
IEEE Embedded Systems Letters | VOL. 6
Bertrand Le Gal, et. al.Bertrand Le Gal ... Christophe Jego
01 Jun 2014
IEEE Embedded Systems Letters | VOL. 6

Implementation and Analysis of Fractals Shapes using GPU-CUDA Model
Amira Bibo Sallow
Academic Journal of Nawroz University | VOL. 10
Amira Bibo SallowAmira Bibo Sallow
28 Apr 2021
Academic Journal of Nawroz University | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm

Abstract

Talk to us

Similar Papers

More From: International Journal of Networked and Distributed Computing