HPC Process and Optimal Network Device Affinitization

Ravindra Babu Ganapathi,Russell W Mcguire,Aravind Gopalakrishnan

doi:10.1109/tmscs.2018.2871444

Abstract

High Performance Computing (HPC) applications have demanding need for hardware resources such as processor, memory, and storage. Applications in the area of Artificial Intelligence and Machine Learning are taking center stage in HPC, which is driving demand for increasing compute resources per node which in turn is pushing bandwidth requirement between the compute nodes. New system design paradigms exist where deploying a system with more than one high performance IO device per node provides benefits. The number of I/O devices connected to the HPC node can be increased with PCIe switches and hence some of the HPC nodes are designed to include PCIe switches to provide a large number of PCIe slots. With multiple IO devices per node, application programmers are forced to consider HPC process affinity to not only compute resources but extend this to include IO devices. Mapping of process to processor cores and the closest IO device(s) increases complexity due to three way mapping and varying HPC node architectures. While operating systems perform reasonable mapping of process to processor core(s), they lack the application developer's knowledge of process workflow and optimal IO resource allocation when more than one IO device is attached to the compute node. This paper is an extended version of our work published in [1] . Our previous work provided solution for IO device affinity choices by abstracting the device selection algorithm from HPC applications. In this paper, we extend the affinity solution to enable OpenFabric Interfaces (OFI) which is a generic HPC API designed as part of the OpenFabrics Alliance that enables wider HPC programming models and applications supported by various HPC fabric vendors. We present a solution for IO device affinity choices by abstracting the device selection algorithm from HPC applications. MPI continues to be the dominant programming model for HPC and hence we provide evaluation with MPI based micro benchmarks. Our solution is then extended to OpenFabric Interfaces which supports other HPC programming models such as SHMEM, GASNet, and UPC. We propose a solution to solve NUMA issues at the lower level of the software stack that forms the runtime for MPI and other programming models independent of HPC applications. Our experiments are conducted on a two node system where each node consists of two socket Intel Xeon servers, attached with up to four Intel Omni-Path fabric devices connected over PCIe. The performance benefits seen by applications by affinitizing processes with best possible network device is evident from the results where we notice up to 40 percent improvement in uni-directional bandwidth, 48 percent bi-directional bandwidth, 32 percent improvement in latency measurements, and up to 40 percent improvement in message rate with OSU benchmark suite. We also extend our evaluation to include OFI operations and an MPI benchmark used for Genome assembly. With OFI Remote Memory Access (RMA) operations we see a bandwidth improvement of 32 percent for fi_read and 22 percent with fi_write operations, and also latency improvement of 15 percent for fi_read and 14 percent for fi_write. K-mer MMatching Interface HASH benchmark shows an improvement of up to 25 percent while using local network device versus using a network device connected to remote Xeon socket.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HPC Process and Optimal Network Device Affinitization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multi-Scale Computing Systems

Lead the way for us

Journal: IEEE Transactions on Multi-Scale Computing Systems	Publication Date: Oct 1, 2018
Citations: 13

Similar Papers

MPI Process and Network Device Affinitization for Optimal HPC Application Performance
Ravindra Babu Ganapathi ... Russell W Mcguire
-
Ravindra Babu Ganapathi, et. al.Ravindra Babu Ganapathi ... Russell W Mcguire
01 Aug 2017
01 Aug 2017

Improving HPC Application Performance in Public Cloud
Rashid Hassani ... Peter Luksch
IERI Procedia | VOL. 10
Rashid Hassani, et. al.Rashid Hassani ... Peter Luksch
01 Jan 2014
IERI Procedia | VOL. 10

Enabling high performance computing in cloud computing environments
M Kumaresan ... G.K.D Prasanna Venkatesan
-
M Kumaresan, et. al.M Kumaresan ... G.K.D Prasanna Venkatesan
01 Apr 2017
01 Apr 2017

Optimization of performance and scheduling of HPC applications in cloud using cloudsim and scheduling approach
D Boobala Muralitharan ... S Arockia Babi Reebha
-
D Boobala Muralitharan, et. al.D Boobala Muralitharan ... S Arockia Babi Reebha
01 May 2017
01 May 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HPC Process and Optimal Network Device Affinitization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multi-Scale Computing Systems