Abstract

Data replication in data grids is an efficient technique that aims to improve response time, reduce the bandwidth consumption and maintain reliability. In this context, a lot of work is done and many strategies have been proposed. Unfortunately, most of existing replication techniques are based on single file granularity and neglect correlation among different data files. Indeed, file correlations become an increasingly important consideration for performance enhancement in data grids. In fact, the analysis of real data intensive grid applications reveals that job requests for groups of correlated files and suggests that these correlations can be exploited for improving the effectiveness of replication strategies. In this paper, we propose a new dynamic periodic decentralized data replication strategy, called RSBMFCP (1), which consider a set of correlated files as granularity. Our strategy gathers files according to a relationship of simultaneous accesses between files by jobs and stores correlated files at the same site. In order to find out these correlations, a maximal frequent correlated pattern mining algorithm of the data mining field is introduced. We choose the all-confidence as correlation measure. The proposed strategy consists of four steps: storing file access history, converting the file access history into a logical history file, applying maximal frequent correlated pattern mining algorithm and performing replication and replacement. Experiments using the well-known data grid simulator Opt or Sim show that our proposed strategy has better performance in comparison with other strategies in terms of job execution time and effective network usage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call