Abstract

APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth.

Highlights

  • Scaling towards exaFLOPS systems in HPC requires to select an high performances interconnection network solution able to tackle with requirements of low power consumption, high efficiency and resilience

  • A custom NIC for the QUonG cluster would have been impossibly lengthy and costly to build without the resources offered by FPGAs, which allow a tighter design cycle and the fast development of a more reliable product

  • In this paper we described a couple of the iteration on this design cycle, that definitely helped to make the APEnet+ card the first P2P-enabled non-NVIDIA device, guaranteeing off-board GPU-to-GPU data transfers with with unprecedented low latency performances

Read more

Summary

Introduction

Scaling towards exaFLOPS systems in HPC requires to select an high performances interconnection network solution able to tackle with requirements of low power consumption, high efficiency and resilience. Peer-to-Peer GPU memory access A peculiar feature of APEnet+ can be exploited when the cluster nodes are equipped with Fermiand Kepler-class NVIDIA GPUs: APEnet+ is the first non-NVIDIA device able to directly access their memory leveraging upon their peer-to-peer (P2P) capabilites In this way Remote GPU-to-GPU data transfers are possible without staging and involving the CPU, resulting in a very low transfer latency. New hardware blocks for performance improvements The use of FPGAs made possible for the development of APEnet+ a tight design cycle and modular and reconfigurable architecture In this way we could evolve the architecture by working around the critical areas as shown by benchmarks, and remove or reduce the performance bottlenecks.

Tx speed-up
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.