Abstract

Large-scale data-intensive applications provide services to users by routing service requests to geographically distributed data centers interconnected by Internet links. In order to achieve good reliability and data access latency performance, cloud service providers often simultaneously place multiple copies of the data in different data centers. The network communication required for updating the multiple data copies incurs an operational cost. At the same time, the penalty incurred by the Service Level Agreement (SLA) violation for data access from the data centers also imposes an operational cost on the service providers. In this paper, we tackle the problem of data placement in distributed data centers with the aim to minimize the operational cost incurred by delay SLA violation penalty and inter-data center network communication, assuming each data has K data replicas. We propose a K-level Cluster-based Data Placement algorithm (K-CDP) for the problem. The algorithm solves the linear programming relaxation and dual programming problems corresponding to the problem of minimizing SLA violation penalty cost caused by placing a replica of each data in a data center. Based on the obtained solutions, the algorithm clusters the data so that the data with similar placeable data centers form a data cluster. For the data in each cluster, the algorithm selects K data centers to minimize the operational cost. We prove that algorithm K-CDP is 2-approximation to the data placement problem. Our simulation results demonstrate that the proposed algorithm can effectively reduce the penalty cost incurred by delay SLA violation, the network communication cost, and the operational cost of data centers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call