Abstract

In recent years the usage of machine learning techniques within data-intensive sciences in general and high-energy physics in particular has rapidly increased, in part due to the availability of large datasets on which such algorithms can be trained, as well as suitable hardware, such as graphic or tensor processing units, which greatly accelerate the training and execution of such algorithms. Within the HEP domain, the development of these techniques has so far relied on resources external to the primary computing infrastructure of the WLCG (Worldwide LHC Computing Grid). In this paper we present an integration of hardware-accelerated workloads into the Grid through the declaration of dedicated queues with access to hardware accelerators and the use of Linux container images holding a modern data science software stack. A frequent use-case in the development of machine learning algorithms is the optimization of neural networks through the tuning of their Hyper Parameters (HP). For this often a large range of network variations must be trained and compared, which for some optimization schemes can be performed in parallel – a workload well suited for Grid computing. An example of such a hyper-parameter scan on Grid resources for the case of flavor tagging within ATLAS is presented.

Highlights

  • The increase in dataset size and computing resource requirements for the HL-LHC is pushing WLCG experiments to look at Machine Learning (ML) techniques to improve the efficiency of data analysis and processing

  • Enabling GPUs on the WLCG Grid The WLCG distributed resources have been built around the HTC (High Throughput Computing) paradigm that focuses on the efficient execution of a large number of loosely-coupled tasks

  • The use of GPUs in ATLAS and more generally in WLCG may increase due to the introduction of ML and resources coming online at sites, but, at HPC centres

Read more

Summary

Introduction

The increase in dataset size and computing resource requirements for the HL-LHC is pushing WLCG experiments to look at Machine Learning (ML) techniques to improve the efficiency of data analysis and processing. In this paper we present an integration of hardwareaccelerated workloads into the Grid through the declaration of dedicated queues with access to hardware accelerators and the use of Linux container images holding a modern data science software stack. An example of such a hyper-parameter scan on Grid resources for the case of flavor tagging within ATLAS is presented.

Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.