Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters

Kumiko Maeda,Munehiro Doi,Hideaki Komatsu,Ryutaro Himeno,Masana Murase,Shigeho Noda

doi:10.1109/ipdps.2012.57

Abstract

Overlapping computations and communication is a key to accelerating stencil applications on parallel computers, especially for GPU clusters. However, such programming is a time-consuming part of the stencil application development. To address this problem, we developed an automatic code generation tool to produce a parallel stencil application with latency hiding automatically from its dataflow model. With this tool, users visually construct the workflows of stencil applications in a dataflow programming model. Our dataflow compiler determines a data decomposition policy for each application, and generates source code that overlaps the stencil computations and communication (MPI and PCIe). We demonstrate two types of overlapping models, a CPU-GPU hybrid execution model and a GPU-only model. We use a CFD benchmark computing 19-point 3D stencils to evaluate our scheduling performance, which results in 1.45 TFLOPS in single precision on a cluster with 64 Tesla C1060 GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Leveraging OmpSs to Exploit Hardware Accelerators
Florentino Sainz ... Jose L Bosque
-
Florentino Sainz, et. al.Florentino Sainz ... Jose L Bosque
01 Oct 2014
01 Oct 2014

Scenarios in Dataflow Modeling and Analysis
Marc C W Geilen ... Kees G W Goossens
-
Marc C W Geilen, et. al.Marc C W Geilen ... Kees G W Goossens
01 Jan 2020
01 Jan 2020

Interactive Debugging of Dynamic Dataflow Embedded Applications
Kevin Pouget ... Miguel Santana
-
Kevin Pouget, et. al.Kevin Pouget ... Miguel Santana
01 May 2013
01 May 2013

BPDF: A statically analyzable dataflow model with integer and boolean parameters
Vagelis Bebelis ... Pascal Fradet
-
Vagelis Bebelis, et. al.Vagelis Bebelis ... Pascal Fradet
08 Jul 2013
08 Jul 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters

Abstract

Talk to us

Similar Papers