Ultra-low latency communication channels for FPGA-based HPC cluster

Roberto Sanchez Correa,Jean Pierre David

doi:10.1016/j.vlsi.2018.05.005

Abstract

The FPGA technology offers numerous advantages in terms of parallel computation, which is supported by on-chip low latency communications. Nevertheless, clustering FPGAs to achieve a larger computing power may require external high-speed and low-latency communication channels. Because of the overhead due to complex features and functionalities, existing off-the-shelf IP cores for high-speed standard communication often waste valuable clock cycles and bandwidth. This paper presents the implementation of an ultra-low latency inter-FPGAs communication IP suitable for high performance computing machines. Our IP achieved 272 ns (34 clock cycles) half-round trip end-to-end latency and an aggregate bandwidth of 16 Gbps per node on Virtex-5 FPGA. To test the proposed IP under a high-performance situation, we implemented an eight-FPGA parallel computing machine hosting 48 coprocessors interconnected through our custom designed network. Experimental results show a global computational efficiency of 97.6%. The proposed architecture is scalable and easily portable to most recent FPGAs, which should lower the latency and increase the bandwidth even more.

Full Text