Abstract

Endpoint congestion is a bottleneck in high-performance computing (HPC) networks, which severely impacts system performance, especially for latency-sensitive applications. When the long messages (or flows) has a far larger duration than the round-trip time (RTT), the proactive or reactive countermeasures, an effective solution to endpoint solution, can control the injection rate within a proper range dynamically. However, many HPC applications produce hybrid traffic (a mix of short and long messages) and are dominated by short messages. Existing proactive congestion avoidance methods face the great challenge of scheduling the rapidly changing traffic caused by these short messages. In this paper, we first propose the Packet-Chaining Reservation Protocol (PCRP), that is, a novel congestion management strategy which leverages the advantages of proactive (scheduling the whole flow) and reactive (scheduling the single packet) congestion avoidance techniques. In fact, it is an eclectic method of scheduling. We select the chaining of packets as a flexible reservation granularity between the whole flow and a single packet. The PCRP allows small flows to be speculatively transmitted without being discarded. It also gives the small flows an appropriate priority based on the detected traffic conditions. The PCRP can make a quick respond to network conditions, effectively avoiding endpoint congestion and reducing the average flow delay. However, PCRP is only suitable for short-flow-dominant traffic, as it performs poorly when facing other traffic. This is because PCRP can starve longer flows and thus introduces the unbearable tail delay. Therefore, we further propose the Packet-Chaining Reservation Protocol with Adaptive Framework (PCRP+), which is a reservation protocol that can flexibly adjust the scheduling strategy according to the network load. The PCRP+ can achieve and maintain low tail delay and desirable universality by adopting the adaptive framework. We conduct extensive experiments to evaluate the PCRP+ and compare it with the Speculative Reservation Protocol (SRP) and Bilateral Flow Reservation Protocol (BFRP), the two most typical proactive reservation-based protocols. Evaluation results demonstrate that our design can reduce the flow latency by an average of 25.71%, 21.21%, and 29.01% for hotspot traffic, uniform traffic, and GPCNeT traffic, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.