Abstract

Dragonfly topologies are gathering great interest nowadays as one of the most promising interconnect options for High-Performance Computing (HPC) systems. However, Dragonflies contain physical cycles that may lead to traffic deadlocks unless the routing algorithm prevents them properly. In general, existing deadlock-free routing algorithms, either deterministic or adaptive, proposed for Dragonflies, use Virtual Channels (VCs) to prevent cyclic dependencies. However, these topology-aware algorithms are difficult to implement, or even unfeasible, in systems based on the InfiniBand (IB) architecture, which is nowadays the most widely used network technology in HPC systems. This is due to some limitations in the IB specification, specifically regarding the way Virtual Lanes (VLs), which are considered as similar to VCs, can be assigned to traffic flows. Indeed, none of the routing engines currently available in the official releases of the IB control software has been specifically proposed for Dragonflies. In this paper, we present a new deterministic, minimal-path routing for Dragonfly that prevents deadlocks using VLs according to the IB specification, so that it can be straightforwardly implemented in IB-based networks. We have called this proposal D3R (Deterministic Deadlock-free Dragonfly Routing). Specifically, D3R maps each route to a single, specific VL depending on the destination group, and according to a specific order, so that cyclic dependencies (so deadlocks) are prevented. D3R is scalable as it requires only 2 VLs to prevent deadlocks regardless of network size, i.e., fewer VLs than the required by the deadlock-free routing engines available in IB that are suitable for Dragonflies. Alternatively, D3R achieves higher throughput if an additional VL is used to reduce internal contention in the Dragonfly groups. We have implemented D3R as a new routing engine in OpenSM, the control software including the subnet manager in IB. We have evaluated D3R by means of simulation and by experiments performed in a real IB-based cluster, the results showing that, in general, D3R outperforms other routing engines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.