Abstract

<p>Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.</p>

Highlights

  • Today, the management of the huge distributed and shared data resources efficiently around the wide area networks becomes a significant topic for both scientific research and commercial application

  • In this Paper we have presented a Four-Phase Data Replication Algorithm named 4PDRA, based on Temporal and Geographical locality is proposed

  • In order to evaluate the performance of the proposed 4PDRA algorithm, simulation environment and parameter setup are discussed followed by the precise performance evaluation results

Read more

Summary

Introduction

The management of the huge distributed and shared data resources efficiently around the wide area networks becomes a significant topic for both scientific research and commercial application. Data replication is a key technique to manage large data in a distributed manner; that is, to create copies of a replica to get faster access to it. The drawback of static replication is evident, when client access patterns change greatly in the Data Grid, the benefits brought by replica will decrease sharply. Dynamic replication takes into consideration the changes of the Grid environments and automatically creates new replicas for popular data files or moves the replicas to other sites when necessary to improve the performance. Data grids are a dynamic environment so dynamic replication is more suitable for these environment [8], [12] In this Paper we have presented a Four-Phase Data Replication Algorithm named 4PDRA, based on Temporal and Geographical locality is proposed.

Related works
Problem assumption
Availability
Four-phase data replication algorithm
Decide which and when to replicate
Determine the number of new replicas
Placement of new replicas
Deletion old replica
PDRA algorithm
Simulation tool and parameter setup
Evaluation metrics
Simulation results and discussion
Conclusion and future works
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call