Abstract
The FPGA technology offers numerous advantages in terms of parallel computation, which is supported by on-chip low latency communications. Nevertheless, clustering FPGAs to achieve a larger computing power may require external high-speed and low-latency communication channels. Because of the overhead due to complex features and functionalities, existing off-the-shelf IP cores for high-speed standard communication often waste valuable clock cycles and bandwidth. This paper presents the implementation of an ultra-low latency inter-FPGAs communication IP suitable for high performance computing machines. Our IP achieved 272 ns (34 clock cycles) half-round trip end-to-end latency and an aggregate bandwidth of 16 Gbps per node on Virtex-5 FPGA. To test the proposed IP under a high-performance situation, we implemented an eight-FPGA parallel computing machine hosting 48 coprocessors interconnected through our custom designed network. Experimental results show a global computational efficiency of 97.6%. The proposed architecture is scalable and easily portable to most recent FPGAs, which should lower the latency and increase the bandwidth even more.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.