Many important cloud services require replicating massive data from one datacenter (DC) to multiple DCs. While the performance of pair-wise inter-DC data transfers has been much improved, prior solutions are insufficient to optimize bulk-data multicast, as they fail to explore the rich inter-DC overlay paths that exist in geo-distributed DCs, as well as the remaining bandwidth reserved for online traffic under fixed bandwidth separation scheme. To take advantage of these opportunities, we present BDS+, a near-optimal network system for large-scale inter-DC data replication. BDS+ is an application-level multicast overlay network with a fully centralized architecture, allowing a central controller to maintain an up-to-date global view of data delivery status of intermediate servers, in order to fully utilize the available overlay paths. Furthermore, in each overlay path, it leverages dynamic bandwidth separation to make use of the remaining available bandwidth reserved for online traffic. By constantly estimating online traffic demand and rescheduling bulk-data transfers accordingly, BDS+ can further speed up the massive data multicast. Through a pilot deployment in one of the largest online service providers and large-scale real-trace simulations, we show that BDS+ can achieve 3- 5× speedup over the provider's existing system and several well-known overlay routing baselines of static bandwidth separation. Moreover, dynamic bandwidth separation can further reduce the completion time of bulk data transfers by 1.2 to 1.3 times.
Read full abstract