Abstract

As the number of processors on a single chip grows, communication efficiency may dominate the performance of parallel programming. Achieving high throughput communication is a clear goal. This paper proposes a novel 4-stage pipelined router and its corresponding network interface (NI) for a 3stage Clos network. This study verifies the proposed structure using an Altera DE3 FPGA and demonstrates its performance using an in-house C++ simulator. To further improve the throughput, this study proposes a late release scheme (LRS) that reserves the allocated paths. Simulation results show that the throughput improvements are 19.11% and 18.64% under random and mixed traffic, respectively. The latency improvements are 5.1x and 2.53x under Jacobi linear equation simulation with 1k and 512 data sizes, respectively. The throughput improves by 5.07% under 10-flit packet size and 4 middle switches in comparison to CRRD [4]. To maximize cost efficiency implementation for proposed fabric, this paper also alternates between different numbers of middle switches to discover a suggested value of four middle switches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.