Large-Scale Distributed Training Applied to Generative Adversarial Networks for Calorimeter Simulation

Jean-Roch Vlimant,Sofia Vallecorsa,Vladimir Loncar,Felice Pantaleo,Alexander Zlokapa,Thong Nguyen,Dustin Anderson,Maurizio Pierini

doi:10.1051/epjconf/201921406025

Abstract

In recent years, several studies have demonstrated the benefit of using deep learning to solve typical tasks related to high energy physics data taking and analysis. In particular, generative adversarial networks are a good candidate to supplement the simulation of the detector response in a collider environment. Training of neural network models has been made tractable with the improvement of optimization methods and the advent of GP-GPU well adapted to tackle the highly-parallelizable task of training neural nets. Despite these advancements, training of large models over large data sets can take days to weeks. Even more so, finding the best model architecture and settings can take many expensive trials. To get the best out of this new technology, it is important to scale up the available network-training resources and, consequently, to provide tools for optimal large-scale distributed training. In this context, our development of a new training workflow, which scales on multi-node/multi-GPU architectures with an eye to deployment on high performance computing machines is described. We describe the integration of hyper parameter optimization with a distributed training framework using Message Passing Interface, for models defined in keras [12] or pytorch [13]. We present results on the speedup of training generative adversarial networks trained on a data set composed of the energy deposition from electron, photons, charged and neutral hadrons in a fine grained digital calorimeter.

Highlights

Deep neural networks (DNN) are machine learning models with many parameters that are effectively trained using stochastic gradient descent methods
Depending on the topology of the high performance computing (HPC), there can be more than one general purpose graphical units (GP-GPU) per physical host, and we enforce that we do not get more than one process associated with one GP-GPU
We review the technicalities of training neural networks on distributed systems

Summary

Introduction

Deep neural networks (DNN) are machine learning models with many parameters that are effectively trained using stochastic gradient descent methods. Within the context of unsupervised learning and generative models, multiple neural networks can be trained concurrently in the generative adversarial (GAN) scheme [1]. Training GAN with large input data sets and a large input space turns out to be very intensive, taking several hours per epoch Such generative model have had tremendous publicity recently in the field of data science thanks to their great success in generating complex data (images mostly) and application of such models in the field of high energy physics are showing great promises [3].

Neural Networks and Generative Adversarial Network

Model Training with Stochastic Gradient Descent

Distributed Training

Batch Parallelism

Data Parallelism

Model Parallelism

Model Parameter Optimization

Bayesian Optimization using Gaussian Process Prior

Evolutionary Algorithms

Cross Validation

Results

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2019
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Large-Scale Distributed Training Applied to Generative Adversarial Networks for Calorimeter Simulation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

Survey of Image Augmentation Based on Generative Adversarial Network
Fei Yue ... Yalin Song
Journal of Physics: Conference Series | VOL. 2203
Fei Yue, et. al.Fei Yue ... Yalin Song
01 Feb 2022
Journal of Physics: Conference Series | VOL. 2203

On-line training of neural network model and controller for turbogenerators
Q.H Wu ... G.W Irwin
-
Q.H Wu, et. al.Q.H Wu ... G.W Irwin
23 Jul 1991
23 Jul 1991

A generic model-free approach for lithium-ion battery health management
Guangxing Bai ... Michael Pecht
Applied Energy | VOL. 135
Guangxing Bai, et. al.Guangxing Bai ... Michael Pecht
16 Sep 2014
Applied Energy | VOL. 135

Towards GANs’ Approximation Ability
Xuejiao Liu ... Xueshuang Xiang
-
Xuejiao Liu, et. al.Xuejiao Liu ... Xueshuang Xiang
05 Jul 2021
05 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Large-Scale Distributed Training Applied to Generative Adversarial Networks for Calorimeter Simulation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences