Abstract

The INFN Tier-1 datacentre provides computing resources to several HEP and Astrophysics experiments. These are organized in Virtual Organizations submitting jobs to our computing facilities through Computing Elements, acting as Grid interfaces to the Local Resource Manager. We are phasing-out our current LRMS (IBM/Platform LSF 9.1.3) and CEs (CREAM) set to adopt HTCondor as a replacement for LSF and HTCondor-CE in place of CREAM. A small instance has been set up to practice with the cluster management and evaluate the feasibility of our migration plans to a new LRMS and CE set. A second cluster instance has been setup to work on production. A number of management tools have been adapted or rewritten in order to integrate the new system with the existing infrastructure. Two different accounting solution for the HTCondor-CE have been implemented, and the more reliable one have been adopted. A python tool has been written to disentangle the management of HTCondor machines from our puppet instance, and to enable a quicker configuration of the cluster nodes. The monitoring tools tied to the old system are being adapted to also work on the new one. Finally, the most relevant setup steps have been documented in a public wiki page and a support mailing has been created to help other INFN sites willing to migrate their LRMS and CE to HTCondor. This document reports about our experience with HTCondor-CE on top of HTCondor and the integration of this system into our infrastructure.

Highlights

  • The INFN-T1 currently provides a computing power of about 400KHS06 over more than 35000 computing slots on approximately one thousand physical Worker Nodes

  • Grid users cannnot directly interact with the site resources

  • They contact Computing Elements (CE) at the site. These services are designed to act as a frontend for Grid users to the underlying Local Resource Manager (LRMS) submitting and managing their jobs on behalf of them through their whole life cycle

Read more

Summary

Introduction

The INFN-T1 currently provides a computing power of about 400KHS06 over more than 35000 computing slots on approximately one thousand physical Worker Nodes. We adopted the CREAM [2] CE implementation at our site for several years and it proved to be an effective solution at providing Grid access to our local resources, managed by LSF.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.