Data Center Networks (DCNs) suffer from synchronized bursts for segmentation and aggregation modes, leading to buffer overflows at switches and increasing network delay. To overcome this problem, some congestion control algorithms like DCTCP use Explicit Congestion Notification (ECN) to notify in-network congestion and reduce switch buffer occupancy. However, the traditional Additive Increase Multiplicative Decrease (AIMD) method causes high fluctuation of round-trip time (RTT) in DCNs. Some intelligent congestion control algorithms designed for Internet can achieve great flexibility, but are not applicable in DCNs for a lack of accurate congestion feedback. In this paper, we analyze the deficiencies of utilizing RTT as congestion signals and the applicability of learning algorithms in DCNs. Then, we propose DECC, a smart TCP congestion control algorithm for DCNs, which combines Deep Reinforcement Learning (DRL) with ECN to achieve high bandwidth utilization as well as low queuing delay. DECC fully utilizes precise in-network feedback and formulates several QoS requirements to a multi-objective function. Meanwhile, it decouples cwnd adjustment with DRL decision making to gradually learn the optimal congestion control policy in real-time. We evaluate the performance of DECC in various scenarios. Simulation results show that DECC can reduce the queue length at bottleneck switches by more than 50% compared to DCTCP, while maintaining high bandwidth utilization and reducing Flow Completion Time (FCTs) under burst traffic.
Read full abstract