Abstract

Data Grids deal with geographically-distributed large-scale data-intensive applications. Schemes scheduled for data grids attempt to not only improve data access time, but also aim to improve the ratio of data availability to a node, where the data requests are generated. Data replication techniques manage large data by storing a number of data files efficiently. In this paper, we propose centralized dynamic scheduling strategy-replica placement strategies (CDSS-RPS). CDSS-RPS schedule the data and task so that it minimizes the implementation cost and data transfer time. CDSS-RPS consists of two algorithms, namely (a) centralized dynamic scheduling (CDS) and (b) replica placement strategy (RPS). CDS considers the computing capacity of a node and finds an appropriate location for the job. RPS attempts to improve file access time by using replication on the basis of number of accesses, storage capacity of a computing node, and response time of a requested file. Extensive simulations are carried out to demonstrate the effectiveness of the proposed strategy. Simulation results demonstrate that the replication and scheduling strategies improve the implementation cost and average access time significantly.

Highlights

  • The term grid computing was first put forth by Foster and Kesselman [1,2] as the paradigm that provides reliable, consistent resource sharing and execution of jobs in the distributed systems

  • We propose the Centralised Dynamic Scheduling Strategy (CDSS) for the scheduling of jobs, and replica placement strategy (RPS) for replica placement

  • Results are formulated via simulations, with reference to the plain replica transfer scheduling problem (RTSP) algorithms

Read more

Summary

Introduction

The term grid computing was first put forth by Foster and Kesselman [1,2] as the paradigm that provides reliable, consistent resource sharing and execution of jobs in the distributed systems. Data files are duplicated based on user requests and storage capacity requirements of a computing node. The scheduling strategy should be able to minimize the transfer and deletion actions of data files among computing nodes. Devising such a strategy is a challenge, as users’ request patterns may change very frequently in a data grid environment. Our approach consists of minimizing the data access time and implementation cost, i.e., (a) scheduling the tasks on the available computing node for the number of tasks requested, and (b) minimizing the transfer time of all data files with respect to their network bandwidth consumption.

Related Work
Implementation of Cost Minimization Techniques
All Random
Highest Opportunity Cost First
Greedy Object Lowest Cost First
System Model
Replicator
9: End if
21: End while
Replica Placement Strategy
Schedule Enhancement Operator
Illustrative Example
Simulation Results and Discussion
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.