Data Sets Replicas Placements Strategy from Cost-Effective View in the Cloud

Xiuguo Wu

doi:10.1155/2016/1496714

Abstract

Replication technology is commonly used to improve data availability and reduce data access latency in the cloud storage system by providing users with different replicas of the same service. Most current approaches largely focus on system performance improvement, neglecting management cost in deciding replicas number and their store places, which cause great financial burden for cloud users because the cost for replicas storage and consistency maintenance may lead to high overhead with the number of new replicas increased in a pay-as-you-go paradigm. In this paper, towards achieving the approximate minimum data sets management cost benchmark in a practical manner, we propose a replicas placements strategy from cost-effective view with the premise that system performance meets requirements. Firstly, we design data sets management cost models, including storage cost and transfer cost. Secondly, we use the access frequency and the average response time to decide which data set should be replicated. Then, the method of calculating replicas’ number and their store places with minimum management cost is proposed based on location problem graph. Both the theoretical analysis and simulations have shown that the proposed strategy offers the benefits of lower management cost with fewer replicas.

Highlights

Today, several cloud providers offer storage as a service, such as Amazon S3 [1], Google Cloud Storage (GCS) [2], and Microsoft Azure [3]
The experiments were conducted on a cloud computing simulation environment built on the computing facilitates at Network & Information Security Lab, Shandong University of Finance and Economics (SDUFE), China, which is constructed based on SwinDeW [28] and SwinDeW-G [29,30,31]
From the above experimental and simulation results, the following conclusions can be drawn: (1) the proposed data sets replicas placements strategy effectively reduces the cost of application data set; (2) the proposed data replica strategy reduces the number of replicas; (3) the proposed data replica strategy can effectively achieve system load balance by placing the popular data files according to the cost and user access history

Summary

Introduction

Several cloud providers offer storage as a service, such as Amazon S3 [1], Google Cloud Storage (GCS) [2], and Microsoft Azure [3]. Replication technology has been commonly used to minimize the communication latency by bringing the copies of data sets close to the clients [4] They provide data availability, increased fault tolerance, improved scalability, and reduced response time and bandwidth consumption. Most current replicas approaches largely focus on improving reliability and availability [10, 11] by providing users with different replicas of the same service, ignoring the management cost spending on replicas, which cause great financial burden (storage cost, transfer cost, etc.) for cloud users, and for CSPs. It is obvious that the client access latency can be reduced with the number of replicas increased. The main contributions of this paper include (1) proposing data sets management cost models, involving storage cost and transfer cost; (2) presenting a novel global data set replicas placements strategy from cost-effective view named MCRP, which is an approximate minimum-cost solution; (3) evaluating replicas placements algorithms using analysis and simulations.

Related Works

Data Sets Cost Models in the Cloud

Replicas Scarce Resource Model in the Cloud

Data Sets Replicas Placements from Cost-Effective View

Analysis and Evaluation

Conclusions and Future Works