Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics

Manaschai Kunaseth,James N Glosli,David F Richards,Rajiv K Kalia,Aiichiro Nakano,Priya Vashishta

doi:10.1007/s11227-013-0915-x

Abstract

We propose and analyze threading algorithms for hybrid MPI/OpenMP parallelization of a molecular-dynamics simulation, which are scalable on large multicore clusters. Two data-privatization thread scheduling algorithms via nucleation-growth allocation are introduced: (1) compact-volume allocation scheduling (CVAS); and (2) breadth-first allocation scheduling (BFAS). The algorithms combine fine-grain dynamic load balancing and minimal memory-footprint data privatization threading. We show that the computational costs of CVAS and BFAS are bounded by Θ(n 5/3 p −2/3) and Θ(n), respectively, for p threads working on n particles on a multicore compute node. Memory consumption per node of both algorithms scales as O(n+n 2/3 p 1/3), but CVAS has smaller prefactors due to a geometric effect. Based on these analyses, we derive the selection criterion between the two algorithms in terms of the granularity, n/p. We observe that memory consumption is reduced by 75 % for p=16 and n=8,192 compared to a naïve data privatization, while maintaining thread imbalance below 5 %. We obtain a strong-scaling speedup of 14.4 with 16-way threading on a four quad-core AMD Opteron node. In addition, our MPI/OpenMP code achieves 2.58× and 2.16× speedups over the MPI-only implementation on 32,768 cores of BlueGene/P for 0.84 and 1.68 million particle systems, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Journal: The Journal of Supercomputing	Publication Date: Apr 5, 2013
Citations: 39

Similar Papers

Research on data load balancing technology of massive storage systems for wearable devices
Shujun Liang ... Jianwei Zhang
Digital Communications and Networks | VOL. 8
Shujun Liang, et. al.Shujun Liang ... Jianwei Zhang
30 Nov 2020
Digital Communications and Networks | VOL. 8

A provably optimal, distribution-independent parallel fast multipole method
F.E Sevilgen ... S Aluru
-
F.E Sevilgen, et. al.F.E Sevilgen ... S Aluru
01 May 2000
01 May 2000

Deficits in Dynamic Balance and Hop Performance Following ACL Reconstruction Are Not Dependent on Meniscal Injury History.
Adam Vanzile ... Thomas Almonroeder
International journal of sports physical therapy | VOL. 17
Adam Vanzile, et. al.Adam Vanzile ... Thomas Almonroeder
01 Dec 2022
International journal of sports physical therapy | VOL. 17

Scalable concurrent counting
Maurice Herlihy ... Beng-Hong Lim
ACM Transactions on Computer Systems | VOL. 13
Maurice Herlihy, et. al.Maurice Herlihy ... Beng-Hong Lim
01 Nov 1995
ACM Transactions on Computer Systems | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing