Abstract
The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will discuss our experience in implementing analysis of CMS data using the coffea framework along with a discussion of the user experience and future directions.
Highlights
The present challenge for High-Energy Particle Physics (HEP) data analysts is daunting: due to the success of the Large Hadron Collider (LHC) data collection campaign over Run 2 (2015-2018), the Compact Muon Solenoid (CMS) detector has amassed a dataset of order 10 billion proton-proton collision events
The CMS physicist/data-analyst is tasked with processing the resulting tens of terabytes of distilled data in a mostly autonomous fashion, typically designing a processing framework written in C++ or Python using a set of libraries known as the ROOT framework [3], and parallelizing the processing over distributed computing resources using HTCondor [5] or similar high-throughput computing systems
We introduce the concept of columnar analysis and the coffea framework, discuss the user experience and scalability characteristics of the framework, and propose future directions for analysis systems research and development that we will pursue
Summary
The present challenge for High-Energy Particle Physics (HEP) data analysts is daunting: due to the success of the Large Hadron Collider (LHC) data collection campaign over Run 2 (2015-2018), the Compact Muon Solenoid (CMS) detector has amassed a dataset of order 10 billion proton-proton collision events. One of our core goals is to investigate the applicability of solutions found outside HEP towards our data analysis needs. In these proceedings, we introduce the concept of columnar analysis and the coffea framework, discuss the user experience and scalability characteristics of the framework, and propose future directions for analysis systems research and development that we will pursue
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.