Abstract

Data grids have emerged as a useful technology for managing large amounts of distributed data in many fields like scientific experiments and engineering applications. In this regard, replication in data grids is an efficient technique that aims to improve response time, reduce the bandwidth consumption and maintain reliability. Unfortunately, most of existing replication strategies consider a single file-based granularity and neglect correlations among different data files. However, the analysis of many real data intensive applications reveals that jobs and applications request groups of correlated files. In this paper, we propose a new dynamic periodic decentralized data replication strategy, called RSCP11RSCP is the acronym of Replication Strategy based on Correlated Patterns., which considers a set of correlated files as granularity. In order to find out these correlations, a new maximal frequent correlated pattern mining algorithm of the data mining field is introduced. The data in this work is read-only and so there are no consistency issues involved. The evaluation metrics we analyze in the experiments are mean job execution time, effective network usage, total number of replications, hit ratio and percentage of storage filled. Using the OptorSim simulator, extensive experimentations show that our proposed strategy has better performance in comparison to other strategies under most of access patterns.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call