Abstract

With growing data volumes and the scaling of data center clusters, communication resources often become a bottleneck in service provisioning for many MapReduce applications (e.g., training machine learning models). Therefore, data placements that bring data blocks closer to data consumers (e.g., MapReduce applications) are seen as a promising solution. In this article, we propose an efficient data-placement technique that considers network traffic reduction as well as QoS guarantees for the data blocks to optimize the communication resources. We first formulate the joint optimization of the data-placement problem, propose a generic model for minimizing communication costs, and show that the joint data-placement problem is NP-hard. To solve this problem, we propose a heuristic algorithm considering traffic flows in the network topology of data centers by first seeking optimal QoS-aware data placement based on golden division on a Zipflike replica distribution, then transforming the joint data-placement problem into a block-dependence tree (BDT) construction problem, and finally reducing the BDT construction to a graph-partitioning problem. The experimental results demonstrate that our data-placement approach could effectively improve the performance of MapReduce jobs with lower communication costs and less job execution time for big-data processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call