A Memory Congestion-Aware MPI Process Placement for Modern NUMA Systems

Mulya Agung,Muhammad Alfian Amrizal,Kazuhiko Komatsu,Hiroyuki Takizawa,Ryusuke Egawa

doi:10.1109/hipc.2017.00026

Abstract

MPI process placement is an important step to achieve scalable performance on modern non-uniform memory access (NUMA) systems. A recent study on NUMA architectures has shown that, on modern NUMA systems, the memory congestion problem could cause more severe performance degradation than the data locality problem because heavy congestion on memory controllers could cause long latencies. However, conventional work on MPI process placement has focused on locality to minimize the remote-access communication. Moreover, maximizing the locality may actually degrade performance because the load imbalance among nodes in a modern NUMA system may increase. Thus, a process placement algorithm must be designed to consider memory congestion. In this paper, a method to reconcile both the locality and the memory congestion on modern NUMA systems is proposed. This method statically analyzes the application communication pattern to optimize the process placement. A data clustering method is applied to the time-series data of the MPI communications in order to identify data traffics that potentially cause memory congestion. The proposed method has been evaluated with the NPB kernels on a real NUMA system and a simulation environment. Experimental results show that the proposed method can achieve 1.6x performance improvement compared with the current state-of-the-art strategy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Memory Congestion-Aware MPI Process Placement for Modern NUMA Systems

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Online MPI Process Mapping for Coordinating Locality and Memory Congestion on NUMA Systems
...
Supercomputing Frontiers and Innovations | VOL. 7
, et. al. ...
01 Mar 2020
Supercomputing Frontiers and Innovations | VOL. 7

DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems
Mulya Agung ... Ryusuke Egawa
IEEE Access | VOL. 8
Mulya Agung, et. al.Mulya Agung ... Ryusuke Egawa
01 Jan 2020
IEEE Access | VOL. 8

An Automatic MPI Process Mapping Method Considering Locality and Memory Congestion on NUMA Systems
Mulya Agung ... Ryusuke Egawa
-
Mulya Agung, et. al.Mulya Agung ... Ryusuke Egawa
01 Oct 2019
01 Oct 2019

A performance comparison of data and memory allocation strategies for sequence aligners on NUMA architectures
Josefina Lenis ... Miquel Angel Senar
Cluster Computing | VOL. 20
Josefina Lenis, et. al.Josefina Lenis ... Miquel Angel Senar
06 Jul 2017
Cluster Computing | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Memory Congestion-Aware MPI Process Placement for Modern NUMA Systems

Abstract

Talk to us

Similar Papers