High-Performance Computing and Datacenter systems, with numerous endnodes, demand an efficient interconnection network to prevent performance bottlenecks. Fat-Tree topologies are preferred for their high bisection bandwidth and multiple shortest-path routes. While existing adaptive routing excels in light or in-network congestion, it struggles with incast congestion. This paper proposes a new technique, called Congestion-Aware Adaptive Routing (SCAR), which addresses both in-network and incast congestion. SCAR limits adaptivity for incast congestion, using deterministic routing, while employing adaptive routing for non-congesting flows. It also resolves in-network congestion by routing traffic flows through alternative routes. Simulation experiments on large Fat-Trees using synthetic and trace-based traffic patterns modeling realistic applications demonstrate SCAR’s immediate reaction on mitigating in-network congestion, and a reasonable delay during incast situations, while other state-of-the-art solutions are not able to cope with incast and in-network situations at the same time.
Read full abstract