Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Chao-Tung Yang,Chih-Lin Huang,Cheng-Fang Lin

doi:10.1016/j.cpc.2010.06.035

Abstract

Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Abstract

Talk to us

Similar Papers

More From: Computer Physics Communications

Lead the way for us

Journal: Computer Physics Communications	Publication Date: Jul 16, 2010
Citations: 112

Similar Papers

Hybrid Parallel Programming on GPU Clusters
Chao-Tung Yang ... Cheng-Fang Lin
-
Chao-Tung Yang, et. al.Chao-Tung Yang ... Cheng-Fang Lin
01 Sep 2010
01 Sep 2010

Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters
Chao‐Tung Yang ... Chao‐Chin Wu
Concurrency and Computation: Practice and Experience | VOL. 23
Chao‐Tung Yang, et. al.Chao‐Tung Yang ... Chao‐Chin Wu
26 Sep 2010
Concurrency and Computation: Practice and Experience | VOL. 23

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross–Pitaevskii equation
Vladimir Lončar ... Antun Balaž
Computer Physics Communications | VOL. 209
Vladimir Lončar, et. al.Vladimir Lončar ... Antun Balaž
06 Sep 2016
Computer Physics Communications | VOL. 209

Performance-Based Parallel Loop Self-scheduling on Heterogeneous Multicore PC Clusters
Chao-Tung Yang ... Jen-Hsiang Chang
-
Chao-Tung Yang, et. al.Chao-Tung Yang ... Jen-Hsiang Chang
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Abstract

Talk to us

Similar Papers

More From: Computer Physics Communications