Spark and HPC for High Energy Physics Data Analyses

Saba Sehrish,Marc Paterno,Jim Kowalkowski

doi:10.1109/ipdpsw.2017.112

Abstract

A full High Energy Physics (HEP) data analysis is divided into multiple data reduction phases. Processing within these phases is extremely time consuming, therefore intermediate results are stored in files held in mass storage systems and referenced as part of large datasets. This processing model limits what can be done with interactive data analytics. Growth in size and complexity of experimental datasets, along with emerging big data tools are beginning to cause changes to the traditional ways of doing data analyses. Use of big data tools for HEP analysis looks promising, mainly because extremely large HEP datasets can be represented and held in memory across a system, and accessed interactively by encoding an analysis using high- level programming abstractions. The mainstream tools, however, are not designed for scientific computing or for exploiting the available HPC platform features. We use an example from the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) in Geneva, Switzerland. The LHC is the highest energy particle collider in the world. Our use case focuses on searching for new types of elementary particles explaining Dark Matter in the universe. We use HDF5 as our input data format, and Spark to implement the use case. We show the benefits and limitations of using Spark with HDF5 on Edison at NERSC.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spark and HPC for High Energy Physics Data Analyses

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

UPR/Mayaguez High Energy Physics
Hector Mendez
-
Hector MendezHector Mendez
31 Oct 2014
31 Oct 2014

Python and HPC for High Energy Physics Data Analyses
S Sehrish ... M Paterno
-
S Sehrish, et. al.S Sehrish ... M Paterno
12 Nov 2017
12 Nov 2017

Machine learning: how to get more out of HEP data and the Higgs Boson Machine Learning Challenge
Marcin Wolter
-
Marcin WolterMarcin Wolter
11 Sep 2015
11 Sep 2015

Deep Learning and Its Application to LHC Physics
Dan Guest ... Daniel Whiteson
Annual Review of Nuclear and Particle Science | VOL. 68
Dan Guest, et. al.Dan Guest ... Daniel Whiteson
02 Jul 2018
Annual Review of Nuclear and Particle Science | VOL. 68

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spark and HPC for High Energy Physics Data Analyses

Abstract

Talk to us

Similar Papers