Optimizing Barrier Algorithms on Asymmetric Subsystems of NUMA Machines

Mikhail Kurnosov,Elizaveta Tokmasheva

doi:10.1109/usbereit51232.2021.9455093

Abstract

In this paper algorithms to perform barrier synchronization in MPI applications on HPC clusters of NUMA machines are investigated. We consider a case when all MPI processes, need to be synchronized, reside on a same multi socket NUMA machine. In particular, such a problem arises in hierarchical (topology-aware) barriers. Barrier algorithms for SMP/NUMA systems use shared counters and flags in a memory to communicate with each other. To minimize a latency of barrier algorithms it is important to place shared counters and flags in a memory of NUMA node which has minimal summary distance to other used NUMA nodes. We proposed the MinNumaDist algorithm for choosing the root process which is used to allocate shared flags and counters in a memory of its NUMA node. The algorithm selects the root rank with minimal summary distance from its NUMA node to NUMA nodes of all remaining processes. It reduces barrier synchronization time on asymmetric subsystems of processor cores (NUMA nodes and processor sockets have different number of assigned processes). Our experiments on dual socket NUMA machines show that the MinNumaDist decreases the latency of centralized barrier algorithms (central counter, flat tree, flat tree gather/release) on 10-170% in average.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimizing Barrier Algorithms on Asymmetric Subsystems of NUMA Machines

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

HASO: A hot-page aware scheduling optimization method in virtualized NUMA systems
Butian Huang ... Qinming He
-
Butian Huang, et. al.Butian Huang ... Qinming He
01 Apr 2016
01 Apr 2016

Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
Kaifang Zhang ... Yong Dou
The Journal of Supercomputing | VOL. 77
Kaifang Zhang, et. al.Kaifang Zhang ... Yong Dou
28 Apr 2021
The Journal of Supercomputing | VOL. 77

A barrier optimization framework for NUMA multi‐core system
Zhengming Yi ... Fei Chen
Concurrency and Computation: Practice and Experience | VOL. 32
Zhengming Yi, et. al.Zhengming Yi ... Fei Chen
21 Oct 2019
Concurrency and Computation: Practice and Experience | VOL. 32

Barrier Optimization on Asymmetrical NUMA Subsystems
M Kurnosov ... E Tokmasheva
The Herald of the Siberian State University of Telecommunications and Informatics | VOL. -
M Kurnosov, et. al.M Kurnosov ... E Tokmasheva
18 Mar 2021
The Herald of the Siberian State University of Telecommunications and Informatics | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Barrier Algorithms on Asymmetric Subsystems of NUMA Machines

Abstract

Talk to us

Similar Papers