Abstract
Increased data availability and high data access performance are of utmost importance in a large-scale distributed system such as data cloud. To address these issues data can be replicated in various locations in the system where applications are executed. Replication not only improves data availability and access latency but also improves system load balancing. While data replication in distributed cloud storage is addressed in the literature, majority of the current techniques do not consider different costs and benefits of replication from a comprehensive perspective. In this paper, we investigate replica management problem (which is formulated using dynamic programming) in cloud computing environments to support big data applications. To this end, we propose a new highly distributed replica placement algorithm that provides cost-effective replication of huge amount of geographically distributed data into the cloud to meet the quality of service (QoS) requirements of data-intensive (big data) applications while ensuring that the workload among the replica data centers is balanced. In addition, the algorithm takes into account the consistency among replicas due to update propagation. Thus, we build up a multi-objective optimization approach for replica management in cloud that seeks near optimal solution by balancing the trade-offs among the stated issues. For verifying the effectiveness of the algorithm, we evaluated the performance of the algorithm and compared it with two baseline approaches from the literature. The evaluation results demonstrate the usefulness and superiority of the presented algorithm for conditions of interest.
Highlights
Cloud computing has become an attractive and mainstream solution for data storage, processing, and distribution [1]
We present a fully distributed approach to data replication which aims at using a multi-objective model in cloud that seeks near optimal solution by minimizing total replication cost and by balancing the trade-offs among the stated objectives such as quality of service (QoS) requirements from applications, workload of replica data center nodes, consistency of created replicas
Our multi-objective data replication technique is designed based on the Hadoop Distributed File System (HDFS) (Hadoop distributed file system) architecture and it is assumed that different cloud computing datacenters are placed in different geographical locations (Fig. 1)
Summary
Cloud computing has become an attractive and mainstream solution for data storage, processing, and distribution [1]. There are a number of critical issues that need to be addressed to achieve big data replication in cloud storage: i) Determining the degree of data replicas that should be created in the cloud to meet reasonable system and application requirements. Given the issues and trends stated above, in this paper, we investigate replica management problem in cloud computing environments to support big data applications from a holistic view To this end, we provide cost-effective replication of large amount of geographically distributed data into the cloud to meet the quality of service (QoS) requirements of dataintensive (big data) applications while ensuring that the workload among the replica data centers is balanced.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Advanced Computer Science and Applications
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.