Fast finite difference Poisson solvers on heterogeneous architectures

Pedro Valero-Lara,Alfredo Pinelli,Manuel Prieto-Matias

doi:10.1016/j.cpc.2013.12.026

Pedro Valero-Lara, Alfredo Pinelli + Show 1 more

Open Access

https://doi.org/10.1016/j.cpc.2013.12.026

Copy DOI

Abstract

In this paper we propose and evaluate a set of new strategies for the solution of three dimensional separable elliptic problems on CPU–GPU platforms. The numerical solution of the system of linear equations arising when discretizing those operators often represents the most time consuming part of larger simulation codes tackling a variety of physical situations. Incompressible fluid flows, electromagnetic problems, heat transfer and solid mechanic simulations are just a few examples of application areas that require efficient solution strategies for this class of problems. GPU computing has emerged as an attractive alternative to conventional CPUs for many scientific applications. High speedups over CPU implementations have been reported and this trend is expected to continue in the future with improved programming support and tighter CPU–GPU integration. These speedups by no means imply that CPU performance is no longer critical. The conventional CPU-control–GPU-compute pattern used in many applications wastes much of CPU’s computational power. Our proposed parallel implementation of a classical cyclic reduction algorithm to tackle the large linear systems arising from the discretized form of the elliptic problem at hand, schedules computing on both the GPU and the CPUs in a cooperative way. The experimental result demonstrates the effectiveness of this approach.

Highlights

The era of single-threaded processors has come to an end due to the limitation of the CMOS technology and in response, most hardware manufactures are designing and developing multi-core processors and specialized hardware accelerators such as GPUs [6, 16, 17]
In this paper we study the implementation of a fast solver based on a block cyclic reduction algorithm to tackle the linear systems that arise when discretizing a three dimensional separable elliptic problem with standard finite difference
If we discretize (6) with given Dirichlet or Neumann boundary conditions assigned on the edges of a square, using the usual five-point scheme with the discrete variables ordered in a lexicographic fashion, we obtain a linear system of m × n equations: Au = ̃g, where A is a block tridiagonal matrix:

Summary

Introduction

The era of single-threaded processors has come to an end due to the limitation of the CMOS technology and in response, most hardware manufactures are designing and developing multi-core processors and specialized hardware accelerators such as GPUs [6, 16, 17]. In this paper we study the implementation of a fast solver based on a block cyclic reduction algorithm to tackle the linear systems that arise when discretizing a three dimensional separable elliptic problem with standard finite difference. [5] analyzes the performance of a block tridiagonal benchmark on GPUs. Other authors have addressed topics which are somehow related to the present contribution. The BLKTRI code [13] is not well-suited for dense blocks but it is the most popular approach for solving block tridiagonal matrices which arise from separable elliptic partial differential equations. Yao Zhang et al [7] proposed some hybrid algorithms that combine CR with other tridiagonal solvers such as Parallel Cyclic Reduction (PCR) or Recursive Doubling (RD).

Three Dimensional Elliptic Systems

Parallel Tridiagonal Algorithms

Parallel Block Cyclic Reduction

Parallel Three Dimensional Elliptic Systems

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer Physics Communications	Publication Date: Jan 3, 2014
Citations: 35	License type: other-oa

R Discovery Prime

R Discovery Prime

Fast finite difference Poisson solvers on heterogeneous architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer Physics Communications

Lead the way for us

Similar Papers

CGMBE: a model-based tool for the design and implementation of real-time image processing applications on CPU–GPU platforms
Jiahao Wu ... Timothy Blattner
Journal of Real-Time Image Processing | VOL. 18
Jiahao Wu, et. al.Jiahao Wu ... Timothy Blattner
07 Jul 2020
Journal of Real-Time Image Processing | VOL. 18

Dynamic Partitioning-based JPEG Decompression on Heterogeneous Multicore Architectures
Wasuwee Sodsong ... Seongwook Chung
-
Wasuwee Sodsong, et. al.Wasuwee Sodsong ... Seongwook Chung
07 Feb 2014
07 Feb 2014

GPGPU implementation of a synaptically optimized, anatomically accurate spiking network simulator
Ruggero Scorcioni
BMC Neuroscience | VOL. 11
Ruggero ScorcioniRuggero Scorcioni
01 Jul 2010
BMC Neuroscience | VOL. 11

GPGPU implementation of a synaptically optimized, anatomically accurate spiking network simulator
R Scorcioni
-
R ScorcioniR Scorcioni
01 May 2010
01 May 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast finite difference Poisson solvers on heterogeneous architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer Physics Communications