Abstract

Condor glidein-based workload management system (glideinWMS) has been developed and integrated with distributed physics analysis and Monte Carlo (MC) production system at Compact Muon Solenoid (CMS) experiment. The late-binding between the jobs and computing element (CE), and the validation of WorkerNode (WN) environment help significantly reduce the failure rate of Grid jobs. For CPU-consuming MC data production, opportunistic grid resources can be effectively explored via the extended computing pool built on top of various heterogeneous Grid resources. The Virtual Organization (VO) policy is embedded into the glideinWMS and pilot job configuration. GSI authentication, authorization and interfacing with gLExec allows a large user basis to be supported and seamlessly integrated with Grid computing infrastructure. The operation of glideinWMS at CMS shows that it is a highly available and stable system for a large VO of thousands of users and running tens of thousands of user jobs simultaneously. The enhanced monitoring allows system administrators and users to easily track the system-level and job-level status.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.