Data location-aware job scheduling in the grid. Application to the GridWay metascheduler

Antonio Delgado Peris,Jose Hernandez,Ignacio M Llorente,Eduardo Huedo

doi:10.1088/1742-6596/219/6/062043

Abstract

Grid infrastructures constitute nowadays the core of the computing facilities of the biggest LHC experiments. These experiments produce and manage petabytes of data per year and run thousands of computing jobs every day to process that data. It is the duty of metaschedulers to allocate the tasks to the most appropriate resources at the proper time.Our work reviews the policies that have been proposed for the scheduling of grid jobs in the context of very data-intensive applications. We indicate some of the practical problems that such models will face and describe what we consider essential characteristics of an optimum scheduling system: aim to minimise not only job turnaround time but also data replication, flexibility to support different virtual organisation requirements and capability to coordinate the tasks of data placement and job allocation while keeping their execution decoupled.These ideas have guided the development of an enhanced prototype for GridWay, a general purpose metascheduler, part of the Globus Toolkit and member of the EGEE's RESPECT program. Current GridWay's scheduling algorithm is unaware of data location. Our prototype makes it possible for job requests to set data needs not only as absolute requirements but also as functions for resource ranking. As our tests show, this makes it more flexible than currently used resource brokers to implement different data-aware scheduling algorithms.

Full Text