Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network

R Ammendola A,A Lonardo,F Lo Cicero,A Biagioni,O Frezza,P S Paolucci,P Vicini,F Simula,D Rossetti,L Tosoratto

doi:10.1088/1742-6596/523/1/012013

R Ammendola A, A Lonardo + Show 8 more

Open Access

https://doi.org/10.1088/1742-6596/523/1/012013

Copy DOI

Abstract

APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth.

Highlights

Scaling towards exaFLOPS systems in HPC requires to select an high performances interconnection network solution able to tackle with requirements of low power consumption, high efficiency and resilience
A custom NIC for the QUonG cluster would have been impossibly lengthy and costly to build without the resources offered by FPGAs, which allow a tighter design cycle and the fast development of a more reliable product
In this paper we described a couple of the iteration on this design cycle, that definitely helped to make the APEnet+ card the first P2P-enabled non-NVIDIA device, guaranteeing off-board GPU-to-GPU data transfers with with unprecedented low latency performances

Summary

Introduction

Scaling towards exaFLOPS systems in HPC requires to select an high performances interconnection network solution able to tackle with requirements of low power consumption, high efficiency and resilience. Peer-to-Peer GPU memory access A peculiar feature of APEnet+ can be exploited when the cluster nodes are equipped with Fermiand Kepler-class NVIDIA GPUs: APEnet+ is the first non-NVIDIA device able to directly access their memory leveraging upon their peer-to-peer (P2P) capabilites In this way Remote GPU-to-GPU data transfers are possible without staging and involving the CPU, resulting in a very low transfer latency. New hardware blocks for performance improvements The use of FPGAs made possible for the development of APEnet+ a tight design cycle and modular and reconfigurable architecture In this way we could evolve the architecture by working around the critical areas as shown by benchmarks, and remove or reduce the performance bottlenecks.

Tx speed-up

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Jun 6, 2014
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

On Nonuniform Traffic Pattern of Modified Hierarchical 3D-Torus Network
M.M Hafizur Rahman ... Yukinori Sato
IEICE Transactions on Information and Systems | VOL. E94-D
M.M Hafizur Rahman, et. al.M.M Hafizur Rahman ... Yukinori Sato
01 Jan 2010
IEICE Transactions on Information and Systems | VOL. E94-D

Dynamic Communication Performance of a Modified Hierarchical 3D-Torus Network under Non-uniform Traffic Patterns
M.M Hafizur Rahman ... Yukinori Sato
-
M.M Hafizur Rahman, et. al.M.M Hafizur Rahman ... Yukinori Sato
01 Nov 2010
01 Nov 2010

Dynamic Communication Performance of STTN under Various Traffic Patterns Using Virtual Cut-Through Flow Control
Faiz Al Faisal ... M.M Hafizur Rahman
-
Faiz Al Faisal, et. al.Faiz Al Faisal ... M.M Hafizur Rahman
01 Dec 2014
01 Dec 2014

APEnet+: a 3D Torus network optimized for GPU-based HPC Systems
R Ammendola ... F Lo Cicero
Journal of Physics: Conference Series | VOL. 396
R Ammendola, et. al.R Ammendola ... F Lo Cicero
13 Dec 2012
Journal of Physics: Conference Series | VOL. 396

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series