Abstract

The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will discuss our experience in implementing analysis of CMS data using the coffea framework along with a discussion of the user experience and future directions.

Highlights

  • The present challenge for High-Energy Particle Physics (HEP) data analysts is daunting: due to the success of the Large Hadron Collider (LHC) data collection campaign over Run 2 (2015-2018), the Compact Muon Solenoid (CMS) detector has amassed a dataset of order 10 billion proton-proton collision events

  • The CMS physicist/data-analyst is tasked with processing the resulting tens of terabytes of distilled data in a mostly autonomous fashion, typically designing a processing framework written in C++ or Python using a set of libraries known as the ROOT framework [3], and parallelizing the processing over distributed computing resources using HTCondor [5] or similar high-throughput computing systems

  • We introduce the concept of columnar analysis and the coffea framework, discuss the user experience and scalability characteristics of the framework, and propose future directions for analysis systems research and development that we will pursue

Read more

Summary

Introduction

The present challenge for High-Energy Particle Physics (HEP) data analysts is daunting: due to the success of the Large Hadron Collider (LHC) data collection campaign over Run 2 (2015-2018), the Compact Muon Solenoid (CMS) detector has amassed a dataset of order 10 billion proton-proton collision events. One of our core goals is to investigate the applicability of solutions found outside HEP towards our data analysis needs. In these proceedings, we introduce the concept of columnar analysis and the coffea framework, discuss the user experience and scalability characteristics of the framework, and propose future directions for analysis systems research and development that we will pursue

Columnar Analysis
The coffea framework
Scalability
Future directions
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.