Abstract

The present paper discusses the design and application of GridX1, a computational grid project which uses shared resources at several Canadian research institutions. The infrastructure of GridX1 is built using off-the-shelf Globus Toolkit 2 middleware, a MyProxy credential server, and a resource broker based on Condor-G to manage the distributed computing environment. The broker-based job scheduling and management functionality are exposed as a Globus GRAM job service. Resource brokering is based on the Condor matchmaking mechanism, whereby job and resource attributes are expressed as ClassAds, with the attributes Requirements and Rank being used to define respectively the constraints and preferences that the matched entity must meet. Various strategies for ranking resources are presented, including an Estimated-Waiting-Time (EWT) algorithm, a throttled load balancing strategy, and a novel external ranking strategy based on data location. One of the unique features is a mechanism which transparently presents the GridX1 resources as a single compute element to the LHC Computing Grid (LCG), based at the CERN Laboratory in Geneva. This interface was used during the ATLAS data challenge 2 to federate the Canadian resources into the LCG without the overhead of maintaining separate LCG sites. Further, the BaBar particle physics simulation has been adapted to execute on GridX1 and resulted in a simplified management of the production. The usage of the throttled EWT and load balancing strategies combined with external data ranking was found to be very effective in improving efficiency and reducing the job failure rate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call