Abstract

Parallel job execution in the grid environment using MPI technology presents a number of challenges for the sites providing this support. Multiple flavors of the MPI libraries, shared working directories required by certain applications, special settings for the batch systems make the MPI support difficult for the site managers. On the other hand the workload management systems with Pilot Jobs became ubiquitous although the support for the MPI applications in the Pilot frameworks was not available. This support was recently added in the DIRAC Project in the context of the GISELA Latin American Grid Initiative. Special services for dynamic allocation of virtual computer pools on the grid sites were developed in order to deploy MPI rings corresponding to the requirements of the jobs in the central task queue of the DIRAC Workload Management System. Pilot Jobs using user space file system techniques install the required MPI software automatically. The same technique is used to emulate shared working directories for the parallel MPI processes. This makes it possible to execute MPI jobs even on the sites not supporting them officially. Reusing so constructed MPI rings for execution of a series of parallel jobs increases dramatically their efficiency and turnaround. In this contribution we describe the design and implementation of the DIRAC MPI Service as well as its support for various types of MPI libraries. Advantages of coupling the MPI support with the Pilot frameworks are outlined and examples of usage with real applications are presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call