Abstract

The H1 Virtual Organization (VO), as one of the small VOs, employs most components of the EMI or gLite Middleware. In this framework, a monitoring system is designed for the H1 Experiment to identify and recognize within the GRID the best suitable resources for execution of CPU-time consuming Monte Carlo (MC) simulation tasks (jobs). Monitored resources are Computer Elements (CEs), Storage Elements (SEs), WMS-servers (WMSs), CernVM File System (CVMFS) available to the VO HONE and local GRID User Interfaces (UIs).The general principle of monitoring GRID elements is based on the execution of short test jobs on different CE queues using submission through various WMSs and directly to the CREAM-CEs as well. Real H1 MC Production jobs with a small number of events are used to perform the tests. Test jobs are periodically submitted into GRID queues, the status of these jobs is checked, output files of completed jobs are retrieved, the result of each job is analyzed and the waiting time and run time are derived. Using this information, the status of the GRID elements is estimated and the most suitable ones are included in the automatically generated configuration files for use in the H1 MC production.The monitoring system allows for identification of problems in the GRID sites and promptly reacts on it (for example by sending GGUS (Global Grid User Support) trouble tickets). The system can easily be adapted to identify the optimal resources for tasks other than MC production, simply by changing to the relevant test jobs. The monitoring system is written mostly in Python and Perl with insertion of a few shell scripts.In addition to the test monitoring system we use information from real production jobs to monitor the availability and quality of the GRID resources. The monitoring tools register the number of job resubmissions, the percentage of failed and finished jobs relative to all jobs on the CEs and determine the average values of waiting and running time for the involved GRID queues. CEs which do not meet the set criteria can be removed from the production chain by including them in an exception table. All of these monitoring actions lead to a more reliable and faster execution of MC requests.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.