Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs

Hamish J Macintosh,Jasmine E Banks,Neil A Kelson

doi:10.1155/2019/3679839

Abstract

Solving diagonally dominant tridiagonal linear systems is a common problem in scientific high-performance computing (HPC). Furthermore, it is becoming more commonplace for HPC platforms to utilise a heterogeneous combination of computing devices. Whilst it is desirable to design faster implementations of parallel linear system solvers, power consumption concerns are increasing in priority. This work presents the oclspkt routine. The oclspkt routine is a heterogeneous OpenCL implementation of the truncated SPIKE algorithm that can use FPGAs, GPUs, and CPUs to concurrently accelerate the solving of diagonally dominant tridiagonal linear systems. The routine is designed to solve tridiagonal systems of any size and can dynamically allocate optimised workloads to each accelerator in a heterogeneous environment depending on the accelerator’s compute performance. The truncated SPIKE FPGA solver is developed first for optimising OpenCL device kernel performance, global memory bandwidth, and interleaved host to device memory transactions. The FPGA OpenCL kernel code is then refactored and optimised to best exploit the underlying architecture of the CPU and GPU. An optimised TDMA OpenCL kernel is also developed to act as a serial baseline performance comparison for the parallel truncated SPIKE kernel since no FPGA tridiagonal solver capable of solving large tridiagonal systems was available at the time of development. The individual GPU, CPU, and FPGA solvers of the oclspkt routine are 110%, 150%, and 170% faster, respectively, than comparable device-optimised third-party solvers and applicable baselines. Assessing heterogeneous combinations of compute devices, the GPU + FPGA combination is found to have the best compute performance and the FPGA-only configuration is found to have the best overall estimated energy efficiency.

Highlights

IntroductionE SPIKE algorithm has been implemented with good results to solve banded linear systems using CPUs and GPUs and in CPU + GPU heterogeneous environments often using vendor-specific programming paradigms [6]
Given the ubiquity of tridiagonal linear system problems in engineering, economic, and scientific fields, it is no surprise that significant research has been undertaken to address the need for larger models and higher resolution simulations
We have previously investigated the feasibility of FPGA implementations of parallel algorithms including the parallel cyclic reduction and SPIKE [14] for solving small tridiagonal linear systems. is previous work utilised OpenCL to produce portable implementations to target FPGAs and GPUs. e current work again utilises OpenCL since this programming framework allows developers to target a wide range of compute devices including FPGAs, CPUs, and GPUs with a unified language

Summary

Introduction

E SPIKE algorithm has been implemented with good results to solve banded linear systems using CPUs and GPUs and in CPU + GPU heterogeneous environments often using vendor-specific programming paradigms [6]. A scalable SPIKE implementation targeting CPUs and GPUs in a clustered HPC environment to solve massive diagonally dominant linear systems has previously been demonstrated with good computation and communication. E motivation for this paper is to evaluate the feasibility of utilising FPGAs, along with GPUs and CPUs concurrently in a heterogeneous computing environment in order to accelerate the solving of a diagonally dominant tridiagonal linear system. We present the oclspkt routine, an heterogeneous OpenCL implementation of the truncated SPIKE algorithm that can dynamically load balance work allocated to FPGAs, GPUs, and CPUs concurrently or in isolation, in order to solve tridiagonal linear systems of any size.

Background

Evaluation

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Reconfigurable Computing	Publication Date: Oct 13, 2019
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Reconfigurable Computing

Lead the way for us

Similar Papers

Solution of single linear tridiagonal systems and vectorization of the ICCG algorithm on the Cray 1
D.S Kershaw
-
D.S KershawD.S Kershaw
25 Jun 1981
25 Jun 1981

Solution of Single Tridiagonal Linear Systems and Vectorization of the ICCG Algorithm on the Cray-1
David Kershaw
Parallel Computations | VOL. -
David KershawDavid Kershaw
01 Jan 1981
Parallel Computations | VOL. -

A communication-less parallel algorithm for tridiagonal Toeplitz systems
Jeffrey M Mcnally ... R.E Shaw
Journal of Computational and Applied Mathematics | VOL. 212
Jeffrey M Mcnally, et. al.Jeffrey M Mcnally ... R.E Shaw
27 Mar 2007
Journal of Computational and Applied Mathematics | VOL. 212

The upper and lower bounds for generalized minimal residual method on a tridiagonal Toeplitz linear system
Reza Doostaki ... Hossein Sadeghi Goughery
International Journal of Computer Mathematics | VOL. 93
Reza Doostaki, et. al.Reza Doostaki ... Hossein Sadeghi Goughery
10 Feb 2015
International Journal of Computer Mathematics | VOL. 93

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Reconfigurable Computing