Coffea Columnar Object Framework For Effective Analysis

Nicholas Smith,Kevin Pedro,Andrew Melo,Bo Jayatilaka,Maria Acosta,Matteo Cremonesi,Allison Hall,Jim Pivarski,Oliver Gutsche,Lindsey Gray,Stefano Belforte,C Doglioni,P Jackson,D Kim,W Kamleh,L Silvestris,G.A Stewart

doi:10.1051/epjconf/202024506012

Abstract

The coffea framework provides a new approach to High-Energy Physics analysis, via columnar operations, that improves time-to-insight, scalability, portability, and reproducibility of analysis. It is implemented with the Python programming language, the scientific python package ecosystem, and commodity big data technologies. To achieve this suite of improvements across many use cases, coffea takes a factorized approach, separating the analysis implementation and data delivery scheme. All analysis operations are implemented using the NumPy or awkward-array packages which are wrapped to yield user code whose purpose is quickly intuited. Various data delivery schemes are wrapped into a common front-end which accepts user inputs and code, and returns user defined outputs. We will discuss our experience in implementing analysis of CMS data using the coffea framework along with a discussion of the user experience and future directions.

Highlights

The present challenge for High-Energy Particle Physics (HEP) data analysts is daunting: due to the success of the Large Hadron Collider (LHC) data collection campaign over Run 2 (2015-2018), the Compact Muon Solenoid (CMS) detector has amassed a dataset of order 10 billion proton-proton collision events
The CMS physicist/data-analyst is tasked with processing the resulting tens of terabytes of distilled data in a mostly autonomous fashion, typically designing a processing framework written in C++ or Python using a set of libraries known as the ROOT framework [3], and parallelizing the processing over distributed computing resources using HTCondor [5] or similar high-throughput computing systems
We introduce the concept of columnar analysis and the coffea framework, discuss the user experience and scalability characteristics of the framework, and propose future directions for analysis systems research and development that we will pursue

Summary

Introduction

The present challenge for High-Energy Particle Physics (HEP) data analysts is daunting: due to the success of the Large Hadron Collider (LHC) data collection campaign over Run 2 (2015-2018), the Compact Muon Solenoid (CMS) detector has amassed a dataset of order 10 billion proton-proton collision events. One of our core goals is to investigate the applicability of solutions found outside HEP towards our data analysis needs. In these proceedings, we introduce the concept of columnar analysis and the coffea framework, discuss the user experience and scalability characteristics of the framework, and propose future directions for analysis systems research and development that we will pursue

Columnar Analysis

The coffea framework

Scalability

Future directions

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2020
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Coffea Columnar Object Framework For Effective Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

COFFEA - Columnar Object Framework For Effective Analysis [Slides
Nick Smith ... Maria Acosta
-
Nick Smith, et. al.Nick Smith ... Maria Acosta
04 Nov 2019
04 Nov 2019

A scalable gather point based data delivery scheme in mobile social networks
Xiang Wang ... Quanxin Zhao
-
Xiang Wang, et. al.Xiang Wang ... Quanxin Zhao
01 Jul 2016
01 Jul 2016

Performance Analysis of Data Delivery Schemes for a Multi-Sink Wireless Sensor Network
Hwee-Pink Tan ... Winston K.G Seah
-
Hwee-Pink Tan, et. al.Hwee-Pink Tan ... Winston K.G Seah
01 Jan 2008
01 Jan 2008

Delay/Fault-Tolerant Mobile Sensor Network (DFT-MSN): A New Paradigm for Pervasive Information Gathering
Yu Wang ... Hongyi Wu
IEEE Transactions on Mobile Computing | VOL. 6
Yu Wang, et. al.Yu Wang ... Hongyi Wu
01 Sep 2007
IEEE Transactions on Mobile Computing | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Coffea Columnar Object Framework For Effective Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences