Abstract

Modern experiments in high energy physics analyze millions of events recorded in particle detectors to select the events of interest and make measurements of physics parameters. These data can often be stored as tabular data in files with detector information and reconstructed quantities. Most current techniques for event selection in these files lack the scalability needed for high performance computing environments. We describe our work to develop a high energy physics analysis framework suitable for high performance computing. This new framework utilizes modern tools for reading files and implicit data parallelism. Framework users analyze tabular data using standard, easy-to-use data analysis techniques in Python while the framework handles the file manipulations and parallelism without the user needing advanced experience in parallel programming. In future versions, we hope to provide a framework that can be utilized on a personal computer or a high performance computing cluster with little change to the user code.

Highlights

  • High-Energy Physics (HEP) experiments continue grow in size and complexity, requiring the employment of sophisticated analytical and computational tools

  • Datasets are approaching the exabyte-scale leading to challenges in data handling and process distribution for analysis programs. These challenges are requiring HEP experiments to migrate to using High Performance Computing (HPC) facilities, tools, and techniques in order to efficiently perform physics analyses

  • This paper introduces PandAna, an analysis framework built upon modern HPC tools and techniques

Read more

Summary

Introduction

High-Energy Physics (HEP) experiments continue grow in size and complexity, requiring the employment of sophisticated analytical and computational tools. Datasets are approaching the exabyte-scale leading to challenges in data handling and process distribution for analysis programs. These challenges are requiring HEP experiments to migrate to using High Performance Computing (HPC) facilities, tools, and techniques in order to efficiently perform physics analyses. PandAna uses the HDF5 [1] file format, widely used for HPC applications, to support efficient and scalable storage. Python libraries such as h5py [2] and mpi4py [3] are utilized internally to provide easy-to-use parallel I/O and processing capabilities that remove scalability limitations of traditional HEP analyses without parallel programming experience. NOvA has used PandAna for basic data selection for machine learning particle identification algorithms

Neutrino Experiments
Experimental Needs
Background
PandAna Framework
Dependencies
Proxy DataFrame
PandAna Analysis
Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.