Abstract

The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception. Physicists must be provided with first-class analysis tools which are easy to use, exploit bleeding edge hardware technologies and allow to seamlessly express parallelism. This document discusses the declarative analysis engine of ROOT, RDataFrame, and gives details about how it allows to profitably exploit commodity hardware as well as high-end servers and manycore accelerators thanks to the synergy with the existing parallelised ROOT components. Real-life analyses of LHC experiments’ data expressed in terms of RDataFrame are presented, highlighting the programming model provided to express them in a concise and powerful way. The recent developments which make RDataFrame a lightweight data processing framework are described, such as callbacks and I/O capabilities. Finally, the flexibility of RDataFrame and its ability to read data formats other than ROOT’s are characterised, as an example it is discussed how RDataFrame can directly read and analyse LHCb’s raw data format MDF.

Highlights

  • The ROOT project is committed to take physicists from data acquisition to publication as effectively as possible

  • The need to offer analysts simpler and yet powerful interfaces that could let them exploit the full potential of their hardware became all the more apparent with the increased luminosity and the upgrades of the LHC experiments foreseen for Run III [1], HL-LHC [2] and FCC [3] – with the consequent increase in the amount and complexity of available data

  • Novel elements introduced by RDataFrame are the choice of programming language (C++), which allows usage of template metaprogramming to avoid runtime overhead while maintaining generality of interfaces, the integration of just-in-time compilation of user-defined expressions to make analysis definition concise when top performance is not required, and a tight integration with the rest of the ROOT data analysis toolkit

Read more

Summary

Introduction

The ROOT project is committed to take physicists from data acquisition to publication as effectively as possible. ROOT::RDataFrame 1, has been developed in order to address these requirements. In a similar vein to other modern data analysis frameworks such as Apache Spark’s DataFrames [5] and Python’s data analysis library pandas [6], RDataFrame exposes a declarative API designed to be easy to use correctly and hard to use incorrectly. Novel elements introduced by RDataFrame are the choice of programming language (C++), which allows usage of template metaprogramming to avoid runtime overhead while maintaining generality of interfaces, the integration of just-in-time compilation of user-defined expressions to make analysis definition concise when top performance is not required, and a tight integration with the rest of the ROOT data analysis toolkit. One real-world application of the framework and performance benchmarks are discussed in sections 4 and 5 respectively

RDataFrame’s software design
Recently introduced features
User-defined callbacks
A real-world RDataFrame application
Scaling and performance benchmarks
Conclusions and Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call