Abstract

In this research, we investigated two approaches to detect job anomalies and/or contention for large scale computing efforts: 1. Preemptive job scheduling using binomial classification long short-term memory networks 2. Forecasting intra-node computing loads from the active jobs and additional job(s) For approach 1, we achieved a 14% improvement in computational resources utilization and an overall classification accuracy of 85% on real tasks executed in a High Energy Physics computing workflow. For this paper, we present the preliminary results used in second approach.

Highlights

  • The current computing grid scheduling tools do not provide a robust mechanism to protect computing resources from “bad” behaviors

  • Pilot agents run on Grid Worker Nodes (WNs) reserving resource for the immediate use and requesting compute jobs from the workload management system (WMS)

  • The additional performance metrics were captured in the dedicated extension and the JobID was used to relate the information between the DIRAC system and the extension

Read more

Summary

Introduction

The current computing grid scheduling tools do not provide a robust mechanism to protect computing resources from “bad” behaviors. There are limited mechanisms to handle input/output (I/O), memory, and networking contention. The motivation for this work is to build and study the use of a multi-tiered neural network model to identify and predict various sources of contention and proactively schedule jobs . We investigate the ability to model the node level (single tier) computing load using a recurrent neural network

High Energy Physics Distributed Computing
Belle II computing use case
Demonstrator and Dataset
Neural-Net Training and Performance
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.