Abstract
<p>Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.</p>
Highlights
Today, the management of the huge distributed and shared data resources efficiently around the wide area networks becomes a significant topic for both scientific research and commercial application
In this Paper we have presented a Four-Phase Data Replication Algorithm named 4PDRA, based on Temporal and Geographical locality is proposed
In order to evaluate the performance of the proposed 4PDRA algorithm, simulation environment and parameter setup are discussed followed by the precise performance evaluation results
Summary
The management of the huge distributed and shared data resources efficiently around the wide area networks becomes a significant topic for both scientific research and commercial application. Data replication is a key technique to manage large data in a distributed manner; that is, to create copies of a replica to get faster access to it. The drawback of static replication is evident, when client access patterns change greatly in the Data Grid, the benefits brought by replica will decrease sharply. Dynamic replication takes into consideration the changes of the Grid environments and automatically creates new replicas for popular data files or moves the replicas to other sites when necessary to improve the performance. Data grids are a dynamic environment so dynamic replication is more suitable for these environment [8], [12] In this Paper we have presented a Four-Phase Data Replication Algorithm named 4PDRA, based on Temporal and Geographical locality is proposed.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of Advanced Computer Science & Technology
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.