Performance Comparison of GPU-Based Jacobi Solvers Using CUDA Provided Synchronization Methods

Maria Aslam,Omer Riaz,Shahzad Mumtaz,Ali Daniyal Asif

doi:10.1109/access.2020.2973669

Maria Aslam, Omer Riaz + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.2973669

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 6	License type: CC BY 4.0

Affiliation: Islamia University of Bahawalpur

Abstract

In this manuscript, variants of Jacobi solver implementation on general purpose graphical processing units (GPGPU) have been purposed and compared. During this work, parallel implementation of finite element method (FEM) using Poisson's equation on shared memory architecture as well as on GPGPUs has been observed to identify computationally most expensive part of FEM software, which is linear algebra Jacobi solver. Sparse matrices were used for system of linear equations. Nine implementations of Jacobi solver have been developed and compared using various synchronization and computation methods like atomicAdd, atomicAdd_block, butterfly communication, grid synchronization, hybrid and whole GPU based computation methods, respectively. Experiments have showed that Jacobi implementations based on our implemented Butterfly communication method have outperformed CUDA 10.0 provided critical execution methods like atomicAdd, atomicAdd_block and grid methods. The GPU has achieved a max speedup of 46 times using GTX 1060 and 60 times using Quadro P4000 with double precision computations when compared with sequential implementation on Core-i7 8750H. All the developments were performed using C/C++ GNU compiler 7.3.0 on Ubuntu 18.04 and CUDA 10.0.

Highlights

High Performance Computing is referred as a branch of computer science used for solution of large and highly complex problems in domain of science, engineering and business.In High Performance Computing, many-core processors have gained more popularity than multi-core CPUs
Until 2006, it was very challenging for programmers to write programs for early graphics chips in higher level programming interface, as underlying code must fit into APIs that intended to paint
Majority of computations in finite element method (FEM) solver are of single instruction multiple data flavor, that’s why these are well suited for many core architecture

Summary

INTRODUCTION

High Performance Computing is referred as a branch of computer science used for solution of large and highly complex problems in domain of science, engineering and business. As GPUs have been evolved into massively parallel many-threaded multi-core units that supports highly efficient computation of large blocks of data in parallel and high memory bandwidth. The limited size of on-chip memory which is 49 KB per block in device of compute capability 6.x or more, is the main hurdle in utilizing registers or shared memory This memory is organized into 32 banks, that serves 32 threads of one wrap concurrently. The NVIDIA introduces a barrier synchronization method __syncthreads() for block-level coordination When this method is called in kernel, all threads of a block have to wait at this point until all threads in the block reach at that point. M. Aslam et al.: Performance Comparison of GPU-Based Jacobi Solvers Using CUDA Provided Synchronization Methods FIGURE 2. It is implemented for a multi-core shared memory processor and than into GPGPU

BACKGROUND

GPGPU BASED PARALLEL JACOBI SOLVER FORMULATION

EXPERIMENTS AND RESULTS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance Comparison of GPU-Based Jacobi Solvers Using CUDA Provided Synchronization Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

GPGPU를 활용한 OpenFOAM 기반 해석자 성능 분석
Seoeum Han ... Hwanghui Jeong
Journal of Computational Fluids Engineering | VOL. 26
Seoeum Han, et. al.Seoeum Han ... Hwanghui Jeong
30 Jun 2021
Journal of Computational Fluids Engineering | VOL. 26

Iterative Numerical Method of Gate Turn-Off Thyristor: Comparative Study Between Si and SiC
S.M Zahim ... J.M Shawal
American Journal of Engineering and Applied Sciences | VOL. 2
S.M Zahim, et. al.S.M Zahim ... J.M Shawal
01 Feb 2009
American Journal of Engineering and Applied Sciences | VOL. 2

Improved and vectorised matlab-based algorithms for serial and parallel implementation of finite element method in linear elasticity
Baurice Sylvain Sadjiep Tchuigwa ... Jan Krmela
-
Baurice Sylvain Sadjiep Tchuigwa, et. al.Baurice Sylvain Sadjiep Tchuigwa ... Jan Krmela
22 May 2024
22 May 2024

Multiscale modeling and simulation methods for electromagnetic and multiphysics problems
Su Yan ... Yang Liu
International Journal of Numerical Modelling: Electronic Networks, Devices and Fields | VOL. 34
Su Yan, et. al.Su Yan ... Yang Liu
10 Oct 2021
International Journal of Numerical Modelling: Electronic Networks, Devices and Fields | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Comparison of GPU-Based Jacobi Solvers Using CUDA Provided Synchronization Methods

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access