Abstract
We describe a transport protocol suitable for BSPlib programs running on a cluster of PCs connected by a 100 Mbps Ethernet switch. The protocol provides a reliable packet-delivery mechanism that uses global knowledge of a program's communication pattern to maximise switch performance. The performance is comparable to previous low-latency protocols on similar hardware, but the addition of reliability means that this protocol can be directly used by application software. For a modest budget of $US 20 000 it is possible to build a machine that outperforms an IBM SP2 on all the NAS benchmarks (BT +80%, SP +70%, MG +9%, and LU +65% improvement), and an SGI Origin 2000 on half (BT +10, SP −24%, MG +10%, and LU −28%). The protocol has a CPU overhead of 1.5 μ s for packet download and 3.6 μ s for upload. Small packets can be communicated through the switch in a pipelined fashion every 21 μ s. Application-to-application one-way latency is 29 μ s plus the latency of the switch. A raw link bandwidth of 93 Mbps is achieved for 1400-byte packets, and 50 Mbps for 128-byte packets. This scales to eight processors communicating at 91 Mbps per link, to give a sustained global bandwidth of 728 Mbps.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.