Abstract

With the LHC continuing to collect more data and experimental analyses becoming increasingly complex, tools to efficiently develop and execute these analyses are essential. The bamboo framework defines a domain-specific language, embedded in python, that allows to concisely express the analysis logic in a functional style. The implementation based on ROOT’s RDataFrame and cling C++ JIT compiler approaches the performance of dedicated native code. Bamboo is currently being used for several CMS Run 2 analyses that rely on the NanoAOD data format, which will become more common in Run 3 and beyond, and for which many reusable components are included, but it provides many possibilities for customisation, which allow for straightforward adaptation to other formats and workflows.

Highlights

  • As the LHC moves to a third run, and a high-luminosity phase, where the gain in physics reach mostly comes from the increase in the size of the data samples than from an increase in centre-of-mass energy, the focus of experimental measurements shifts towards improving the precision

  • Data collected over longer periods of time are jointly analyzed, and calibrations, corrections, and data analysis techniques become increasingly fine-grained and sophisticated. This has an impact on data analysis workflows, where there is a trend towards more compact and standardized data formats, e.g. the introduction of the NanoAOD format in the CMS collaboration [1], which stores about 1 kB of information per collision or simulated event, and is foreseen to cover the needs of a major fraction of the measurements and searches

  • The bamboo1 package avoids this trade off by providing a flexible python framework with a high-level interface to construct computation graphs that can efficiently be executed by RDataFrame [2] (RDF), a declarative columnar interface for data analysis in the ROOT framework [3, 4]

Read more

Summary

Introduction

As the LHC moves to a third run, and a high-luminosity phase, where the gain in physics reach mostly comes from the increase in the size of the data samples than from an increase in centre-of-mass energy, the focus of experimental measurements shifts towards improving the precision. Data collected over longer periods of time are jointly analyzed, and calibrations, corrections, and data analysis techniques become increasingly fine-grained and sophisticated. This has an impact on data analysis workflows, where there is a trend towards more compact and standardized data formats, e.g. the introduction of the NanoAOD format in the CMS collaboration [1], which stores about 1 kB of information per collision or simulated event, and is foreseen to cover the needs of a major fraction of the measurements and searches. Ideas for future developments (Section 6) and a conclusion (Section 7) are presented

Event view and constructing derived quantities
Task management
Future directions
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.