Abstract
Data Grids deal with geographically-distributed large-scale data-intensive applications. Schemes scheduled for data grids attempt to not only improve data access time, but also aim to improve the ratio of data availability to a node, where the data requests are generated. Data replication techniques manage large data by storing a number of data files efficiently. In this paper, we propose centralized dynamic scheduling strategy-replica placement strategies (CDSS-RPS). CDSS-RPS schedule the data and task so that it minimizes the implementation cost and data transfer time. CDSS-RPS consists of two algorithms, namely (a) centralized dynamic scheduling (CDS) and (b) replica placement strategy (RPS). CDS considers the computing capacity of a node and finds an appropriate location for the job. RPS attempts to improve file access time by using replication on the basis of number of accesses, storage capacity of a computing node, and response time of a requested file. Extensive simulations are carried out to demonstrate the effectiveness of the proposed strategy. Simulation results demonstrate that the replication and scheduling strategies improve the implementation cost and average access time significantly.
Highlights
The term grid computing was first put forth by Foster and Kesselman [1,2] as the paradigm that provides reliable, consistent resource sharing and execution of jobs in the distributed systems
We propose the Centralised Dynamic Scheduling Strategy (CDSS) for the scheduling of jobs, and replica placement strategy (RPS) for replica placement
Results are formulated via simulations, with reference to the plain replica transfer scheduling problem (RTSP) algorithms
Summary
The term grid computing was first put forth by Foster and Kesselman [1,2] as the paradigm that provides reliable, consistent resource sharing and execution of jobs in the distributed systems. Data files are duplicated based on user requests and storage capacity requirements of a computing node. The scheduling strategy should be able to minimize the transfer and deletion actions of data files among computing nodes. Devising such a strategy is a challenge, as users’ request patterns may change very frequently in a data grid environment. Our approach consists of minimizing the data access time and implementation cost, i.e., (a) scheduling the tasks on the available computing node for the number of tasks requested, and (b) minimizing the transfer time of all data files with respect to their network bandwidth consumption.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.