High-radix hierarchical structures, such as the dragonfly, fat-tree, and torus, are cost-effective topologies for high-performance computer (HPC) networks. In these networks, dragonfly outperforms traditional topologies such as fat-tree and torus in cost and scalability. However, network congestion occurs due to the imbalanced traffic pattern, which can lead to degraded performance. The routing algorithm influences the performance of the dragonfly topology in many ways. Routing algorithm can be designed to avoid saturating global or local links, and to avoid deadlock in the network. In this letter, we introduce an adaptive multi-level routing (AMLR) for dragonfly networks. AMLR has three-level routes. By dividing these routes meticulously, all paths of the network can be used more effectively. Traffic between groups will be more balanced. In particular, we propose a congestion control scheme to cooperate with AMLR in the data transmission process. Furthermore, congestion detection and notification are leveraged to identify congested packet and inform the network. Evaluations show that the proposed adaptive multi-level routing and congestion control mechanism can relieve the imbalance between groups in the 100-node dragonfly topology. As a result, AMLR provides 26%, 98%, 78%, and 99% lower latencies, and 13%, 87%, 13%, and 128% higher throughputs compared to the shortest routing under uniform, adv+i, hotspot, and permutation traffic, respectively.
Read full abstract