Abstract

Increased data availability and high data access performance are of utmost importance in a large-scale distributed system such as data cloud. To address these issues data can be replicated in various locations in the system where applications are executed. Replication not only improves data availability and access latency but also improves system load balancing. While data replication in distributed cloud storage is addressed in the literature, majority of the current techniques do not consider different costs and benefits of replication from a comprehensive perspective. In this paper, we investigate replica management problem (which is formulated using dynamic programming) in cloud computing environments to support big data applications. To this end, we propose a new highly distributed replica placement algorithm that provides cost-effective replication of huge amount of geographically distributed data into the cloud to meet the quality of service (QoS) requirements of data-intensive (big data) applications while ensuring that the workload among the replica data centers is balanced. In addition, the algorithm takes into account the consistency among replicas due to update propagation. Thus, we build up a multi-objective optimization approach for replica management in cloud that seeks near optimal solution by balancing the trade-offs among the stated issues. For verifying the effectiveness of the algorithm, we evaluated the performance of the algorithm and compared it with two baseline approaches from the literature. The evaluation results demonstrate the usefulness and superiority of the presented algorithm for conditions of interest.

Highlights

  • Cloud computing has become an attractive and mainstream solution for data storage, processing, and distribution [1]

  • We present a fully distributed approach to data replication which aims at using a multi-objective model in cloud that seeks near optimal solution by minimizing total replication cost and by balancing the trade-offs among the stated objectives such as quality of service (QoS) requirements from applications, workload of replica data center nodes, consistency of created replicas

  • Our multi-objective data replication technique is designed based on the Hadoop Distributed File System (HDFS) (Hadoop distributed file system) architecture and it is assumed that different cloud computing datacenters are placed in different geographical locations (Fig. 1)

Read more

Summary

INTRODUCTION

Cloud computing has become an attractive and mainstream solution for data storage, processing, and distribution [1]. There are a number of critical issues that need to be addressed to achieve big data replication in cloud storage: i) Determining the degree of data replicas that should be created in the cloud to meet reasonable system and application requirements. Given the issues and trends stated above, in this paper, we investigate replica management problem in cloud computing environments to support big data applications from a holistic view To this end, we provide cost-effective replication of large amount of geographically distributed data into the cloud to meet the quality of service (QoS) requirements of dataintensive (big data) applications while ensuring that the workload among the replica data centers is balanced.

RELATED WORK
SYSTEM DESIGN AND ARCHITECTURE
Data Access Cost
Problem Definition
QOS-AWARE REPLICA PLACEMENT ALGORITHM
Calculation of Replica Cost and Location
Placing Replicas
Complexity Analysis
SIMULATION SETUP
PERFORMANCE RESULTS
Job Execution Time
Average Bandwidth Use
Storage Use
CONCLUSIONS AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.