Abstract
Grid infrastructures constitute nowadays the core of the computing facilities of the biggest LHC experiments. These experiments produce and manage petabytes of data per year and run thousands of computing jobs every day to process that data. It is the duty of metaschedulers to allocate the tasks to the most appropriate resources at the proper time.Our work reviews the policies that have been proposed for the scheduling of grid jobs in the context of very data-intensive applications. We indicate some of the practical problems that such models will face and describe what we consider essential characteristics of an optimum scheduling system: aim to minimise not only job turnaround time but also data replication, flexibility to support different virtual organisation requirements and capability to coordinate the tasks of data placement and job allocation while keeping their execution decoupled.These ideas have guided the development of an enhanced prototype for GridWay, a general purpose metascheduler, part of the Globus Toolkit and member of the EGEE's RESPECT program. Current GridWay's scheduling algorithm is unaware of data location. Our prototype makes it possible for job requests to set data needs not only as absolute requirements but also as functions for resource ranking. As our tests show, this makes it more flexible than currently used resource brokers to implement different data-aware scheduling algorithms.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.