Abstract

At a cluster of clusters used for parallel computing, it is important to fully utilize the inter-cluster network. Existing MPI implementations for cluster of clusters have two issues: 1) Single point-to-point communication cannot utilize the bandwidth of the high-bandwidth inter-cluster network because a Gigabit Ethernet interface is used at each node for inter-cluster communication, while more bandwidth is available between clusters. 2) Heavy packet loss and performance degradation occur on the TCP/IP protocol when many nodes generate short-term burst traffic. In order to overcome these issues, this paper proposes a novel method called the aggregate router method. In this method, multiple router nodes are set up in each cluster and inter-cluster communication is performed via these router nodes. By striping a single message to multiple routers, the bottleneck caused by network interfaces is reduced. The packet congestion issue is also avoided by using high-speed interconnects in a cluster, instead of the TCP/IP protocol. The aggregated router method is evaluated using the HPC Challenge Benchmarks and the NAS Parallel Benchmarks. The result shows that the proposed method outperforms the existing method by 24% in the best case.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call