Abstract

Inefficient bandwidth sharing in a datacenter network, between different application frameworks, e.g., MapReduce and Spark, can lead to inelastic and skewed usage of link bandwidth and increased completion times for the applications. Existing work, however, either solely focuses on managing computation and storage resources or controlling only sending/receiving rate at hosts. In this paper, we present CoMan, a solution that provides global in-network bandwidth management in multiplexed data centers, with two goals: improving bandwidth utilization and reducing application completion time. CoMan first designs a novel abstraction of virtual link groups (VLGs) to establish a shared bandwidth resource pool. Based on this pool, CoMan implements a three-level bandwidth allocation model, which enables elastic bandwidth sharing among computing frameworks as well as guarantees network performance for the applications. CoMan further improves the bandwidth utilization by devising a VLG dependency graph and solves an optimization problem to guide the path selection using a $\frac{3}{2}$ -approximation algorithm. We conduct comprehensive trace-driven simulations as well as small-scale testbed experiments to evaluate the performance of CoMan. Extensive simulation results show that CoMan improves the bandwidth utilization and speeds up the application completion time by up to $2.83\mathcal{\times}$ and $6.68\mathcal{\times}$ , respectively, compared to the ECMP $+$ ElasticSwitch solution. Our implementation also verifies that CoMan can realistically speed up the application completion times by $2.32\mathrm{\times}$ on average.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call