Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Roberto Ammendola,Alessandro Lonardo,Piero Vicini,Francesco Simula,Ottorino Frezza,Davide Rossetti,Laura Tosoratto,Francesca Lo Cicero,Pier Stanislao Paolucci,Andrea Biagioni

doi:10.1088/1742-6596/513/5/052002

Roberto Ammendola, Alessandro Lonardo + Show 8 more

Open Access

https://doi.org/10.1088/1742-6596/513/5/052002

Copy DOI

Abstract

Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Journal: Journal of Physics: Conference Series	Publication Date: Jun 11, 2014
License type: cc-by

Similar Papers

The Architecture of Direct Data Placement (DDP) and Remote Direct Memory Access (RDMA) on Internet Protocols
S Bailey ... T Talpey
-
S Bailey, et. al.S Bailey ... T Talpey
01 Dec 2005
01 Dec 2005

Fast NIC based RDMA implementation for adaptive unreliable networks
Wang Shaogang ... Xu Weixia
-
Wang Shaogang, et. al. Wang Shaogang ... Xu Weixia
01 Nov 2014
01 Nov 2014

RDMA data transfer and GPU acceleration methods for high-throughput online processing ofserial crystallography images.
Raphael Ponsard ... Dominique Houzet
Journal of synchrotron radiation | VOL. 27
Raphael Ponsard, et. al.Raphael Ponsard ... Dominique Houzet
31 Jul 2020
Journal of synchrotron radiation | VOL. 27

RVMA: Remote Virtual Memory Access
Ryan E Grant ... Matthew G.F Dosanjh
-
Ryan E Grant, et. al.Ryan E Grant ... Matthew G.F Dosanjh
01 May 2021
01 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Abstract

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series