PandAna: A Python Analysis Framework for Scalable High Performance Computing in High Energy Physics

Micah Groh,James B Kowalkowski,Norman Buchanan,Derek Doyle,Marc Paterno,Saba Sehrish,C Biscarat,S Campana,G.A Stewart,S Roiser,C.I Rovelli,B Hegner

doi:10.1051/epjconf/202125103033

Abstract

Modern experiments in high energy physics analyze millions of events recorded in particle detectors to select the events of interest and make measurements of physics parameters. These data can often be stored as tabular data in files with detector information and reconstructed quantities. Most current techniques for event selection in these files lack the scalability needed for high performance computing environments. We describe our work to develop a high energy physics analysis framework suitable for high performance computing. This new framework utilizes modern tools for reading files and implicit data parallelism. Framework users analyze tabular data using standard, easy-to-use data analysis techniques in Python while the framework handles the file manipulations and parallelism without the user needing advanced experience in parallel programming. In future versions, we hope to provide a framework that can be utilized on a personal computer or a high performance computing cluster with little change to the user code.

Highlights

High-Energy Physics (HEP) experiments continue grow in size and complexity, requiring the employment of sophisticated analytical and computational tools
Datasets are approaching the exabyte-scale leading to challenges in data handling and process distribution for analysis programs. These challenges are requiring HEP experiments to migrate to using High Performance Computing (HPC) facilities, tools, and techniques in order to efficiently perform physics analyses
This paper introduces PandAna, an analysis framework built upon modern HPC tools and techniques

Summary

Introduction

High-Energy Physics (HEP) experiments continue grow in size and complexity, requiring the employment of sophisticated analytical and computational tools. Datasets are approaching the exabyte-scale leading to challenges in data handling and process distribution for analysis programs. These challenges are requiring HEP experiments to migrate to using High Performance Computing (HPC) facilities, tools, and techniques in order to efficiently perform physics analyses. PandAna uses the HDF5 [1] file format, widely used for HPC applications, to support efficient and scalable storage. Python libraries such as h5py [2] and mpi4py [3] are utilized internally to provide easy-to-use parallel I/O and processing capabilities that remove scalability limitations of traditional HEP analyses without parallel programming experience. NOvA has used PandAna for basic data selection for machine learning particle identification algorithms

Neutrino Experiments

Experimental Needs

Background

PandAna Framework

Dependencies

Proxy DataFrame

PandAna Analysis

Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PandAna: A Python Analysis Framework for Scalable High Performance Computing in High Energy Physics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

CMS strategy for HPC resource exploitation
Antonio Pérez-Calero Yzquierdo ... L Silvestris
EPJ Web of Conferences | VOL. 245
Antonio Pérez-Calero Yzquierdo, et. al.Antonio Pérez-Calero Yzquierdo ... L Silvestris
01 Jan 2020
EPJ Web of Conferences | VOL. 245

Efficient Distributed Computations with DIRAC
...
-
, et. al. ...
01 Jan 2015
01 Jan 2015

Multidisciplinary Design Optimization of Aero-craft Shapes by Using Grid Based High Performance Computational Framework
Hong Liu ... Qianni Deng
-
Hong Liu, et. al.Hong Liu ... Qianni Deng
01 Jan 2004
01 Jan 2004

Modeling and Simulation of Load Balancing Strategies for Computing in High Energy Physics
René Caspart ... Anne Koziolek
EPJ Web of Conferences | VOL. 214
René Caspart, et. al.René Caspart ... Anne Koziolek
01 Jan 2019
EPJ Web of Conferences | VOL. 214

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PandAna: A Python Analysis Framework for Scalable High Performance Computing in High Energy Physics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences