Abstract

Data Grids seek to harness geographically distributed resources for large-scale data-intensive problems. The issues that need to be considered in the Data Grid research area include resource management for computation and data. Computation management comprises scheduling of jobs, load balancing, fault tolerance and response time; while data management includes replication and movement of data at selected sites. As jobs are data intensive, data management issues often become integral to the problems of scheduling and effective resource management in the Data Grids. Therefore, integration of data replication and scheduling strategies is important. Such an integrating solution is either non-existent or work in a centralized manner which is not scalable. The paper deals with the problem of integrating the scheduling and replication strategies in a distributed manner. As part of the solution, we have proposed a Distributed Replication and Scheduling Strategy (DistReSS) which aims at an iterative improvement of the performance based on coupling between scheduling and replication, which is achieved in distributed and hierarchical fashion. Results suggest that, in the context of our experiments, DistReSS performs comparable to the centralized approach when the parameters are tuned properly in addition to being more scalable to the centralized approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call