Abstract

In recent years, several studies have demonstrated the benefit of using deep learning to solve typical tasks related to high energy physics data taking and analysis. In particular, generative adversarial networks are a good candidate to supplement the simulation of the detector response in a collider environment. Training of neural network models has been made tractable with the improvement of optimization methods and the advent of GP-GPU well adapted to tackle the highly-parallelizable task of training neural nets. Despite these advancements, training of large models over large data sets can take days to weeks. Even more so, finding the best model architecture and settings can take many expensive trials. To get the best out of this new technology, it is important to scale up the available network-training resources and, consequently, to provide tools for optimal large-scale distributed training. In this context, our development of a new training workflow, which scales on multi-node/multi-GPU architectures with an eye to deployment on high performance computing machines is described. We describe the integration of hyper parameter optimization with a distributed training framework using Message Passing Interface, for models defined in keras [12] or pytorch [13]. We present results on the speedup of training generative adversarial networks trained on a data set composed of the energy deposition from electron, photons, charged and neutral hadrons in a fine grained digital calorimeter.

Highlights

  • Deep neural networks (DNN) are machine learning models with many parameters that are effectively trained using stochastic gradient descent methods

  • Depending on the topology of the high performance computing (HPC), there can be more than one general purpose graphical units (GP-GPU) per physical host, and we enforce that we do not get more than one process associated with one GP-GPU

  • We review the technicalities of training neural networks on distributed systems

Read more

Summary

Introduction

Deep neural networks (DNN) are machine learning models with many parameters that are effectively trained using stochastic gradient descent methods. Within the context of unsupervised learning and generative models, multiple neural networks can be trained concurrently in the generative adversarial (GAN) scheme [1]. Training GAN with large input data sets and a large input space turns out to be very intensive, taking several hours per epoch Such generative model have had tremendous publicity recently in the field of data science thanks to their great success in generating complex data (images mostly) and application of such models in the field of high energy physics are showing great promises [3].

Neural Networks and Generative Adversarial Network
Model Training with Stochastic Gradient Descent
Distributed Training
Batch Parallelism
Data Parallelism
Model Parallelism
Model Parameter Optimization
Bayesian Optimization using Gaussian Process Prior
Evolutionary Algorithms
Cross Validation
Results
Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.