Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

T Malas,H Ltaief,G Wellein,G Hager,D Keyes,H Stengel

doi:10.1137/140991133

Abstract

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multicore wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

Abstract

Talk to us

Similar Papers

More From: SIAM journal on scientific computing : a publication of the Society for Industrial and Applied Mathematics

Lead the way for us

Journal: SIAM journal on scientific computing : a publication of the Society for Industrial and Applied Mathematics	Publication Date: Jan 1, 2015
Citations: 94

Similar Papers

Revisiting Temporal Blocking Stencil Optimizations
Lingqi Zhang ... Peng Chen
-
Lingqi Zhang, et. al.Lingqi Zhang ... Peng Chen
21 Jun 2023
21 Jun 2023

LEVERAGING SHARED CACHES FOR PARALLEL TEMPORAL BLOCKING OF STENCIL CODES ON MULTICORE PROCESSORS AND CLUSTERS
Markus Wittmann ... Georg Hager
Parallel Processing Letters | VOL. 20
Markus Wittmann, et. al.Markus Wittmann ... Georg Hager
01 Dec 2010
Parallel Processing Letters | VOL. 20

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory
Markus Wittmann ... Georg Hager
-
Markus Wittmann, et. al.Markus Wittmann ... Georg Hager
01 Apr 2010
01 Apr 2010

Optimal Temporal Blocking for Stencil Computation
Takayuki Muranushi ... Junichiro Makino
Procedia computer science | VOL. 51
Takayuki Muranushi, et. al.Takayuki Muranushi ... Junichiro Makino
01 Jan 2015
Procedia computer science | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

Abstract

Talk to us

Similar Papers

More From: SIAM journal on scientific computing : a publication of the Society for Industrial and Applied Mathematics