Abstract

Virtual machines (VMs) are widely adopted today to provide elastic computing services in datacenters, and they still heavily rely on TCP for congestion control. VM scheduling delays due to CPU sharing can cause frequent spurious retransmit timeouts (RTOs). Using current detection methods, we find that such spurious RTOs cannot be effectively identified because of the retransmission ambiguity caused by the delayed ACK (DelACK) mechanism. Disabling DelACK would add significant CPU overhead to the VMs and thus degrade the network’s performance. In this paper, we first report our practical experience about TCP’s reaction to VM scheduling delays. We then provide an analysis of the problem that has two components corresponding to VM preemption on the sender side and the receiver side, respectively. Finally, we propose PVTCP, a ParaVirtualized approach to counteract the distortion of congestion information caused by the hypervisor scheduler. PVTCP is completely embedded in the guest OS and requires no modification in the hypervisor. Taking incast congestion as an example, we evaluate our solution in a 21-node testbed. The results show that PVTCP has high adaptability in virtualized environments and deals satisfactorily with the throughput collapse problem.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call