Abstract

The traditional approach in HEP analysis software is to loop over every event and every object via the ROOT framework. This method follows an imperative paradigm, in which the code is tied to the storage format and steps of execution. A more desirable strategy would be to implement a declarative language, such that the storage medium and execution are not included in the abstraction model. This will become increasingly important to managing the large dataset collected by the LHC and the HL-LHC. A new analysis description language (ADL) inspired by functional programming, FuncADL, was developed using Python as a host language. The expressiveness of this language was tested by implementing example analysis tasks designed to benchmark the functionality of ADLs. Many simple selections are expressible in a declarative way with FuncADL, which can be used as an interface to retrieve filtered data. Some limitations were identified, but the design of the language allows for future extensions to add missing features. FuncADL is part of a suite of analysis software tools being developed by the Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP). These tools will be available to develop highly scalable physics analyses for the LHC.

Highlights

  • Analyses of high energy physics (HEP) data typically consist of executing a carefully designed algorithm on millions of collision events

  • The Uproot FuncADL backend translates a query abstract syntax tree (AST) into generated Python source code for a function that can be evaluated on a data file and returns the selected and transformed values as an Awkward array

  • The Uproot and RDataFrame backends are both able to run on flat ntuples, so this provides the option for alternative backends and performance benchmark comparisons between them

Read more

Summary

Introduction

Analyses of high energy physics (HEP) data typically consist of executing a carefully designed algorithm on millions of collision events. One of the most common ways to write analysis code is essentially a for-loop over events, generally using C++ and the ROOT software framework [1, 2]. This procedure tightly couples the analysis code to both the file format of the data and the steps of program execution. Two important aspects of database management are data query languages and data independence [5]. The object of the FuncADL project is to provide this interface via a query language that maintains data independence. The FuncADL project encompasses several different aspects: the query language, code generation, and implementations that tie these two steps together. A simple implementation is demonstrated with example analysis tasks drafted by the HEP software community, and there is a discussion of implications and future plans

Query language
Code generation
Implementations
Query examples
Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call