Abstract

The ROOT TTree data format encodes hundreds of petabytes of High Energy and Nuclear Physics events. Its columnar layout drives rapid analyses, as only those parts (“branches”) that are really used in a given analysis need to be read from storage. Its unique feature is the seamless C++ integration, which allows users to directly store their event classes without explicitly defining data schemas. In this contribution, we present the status and plans of the future ROOT 7 event I/O. Along with the ROOT 7 interface modernization, we aim for robust, where possible compile-time safe C++ interfaces to read and write event data. On the performance side, we show first benchmarks using ROOT’s new experimental I/O subsystem that combines the best of TTrees with recent advances in columnar data formats. A core ingredient is a strong separation of the high-level logical data layout (C++ classes) from the low-level physical data layout (storage backed nested vectors of simple types). We show how the new, optimized physical data layout speeds up serialization and deserialization and facilitates parallel, vectorized and bulk operations. This lets ROOT I/O run optimally on the upcoming ultra-fast NVRAM storage devices, as well as file-less storage systems such as object stores.

Highlights

  • The data describing a High Energy Physics (HEP) event is typically represented by a record containing variable-length collections of sub records

  • A typical physics analysis uses a large number of events but processes only a subset of the available properties

  • We intend to add a limited C API for RNTuple in order to facilitate ROOT data being transferred to 3rd party consumers, such as numpy arrays or machine learning toolkits

Read more

Summary

Introduction

The data describing a High Energy Physics (HEP) event is typically represented by a record containing variable-length collections of sub records. ROOT’s TTree storage format support a columnar physical data layout for nested sub records and collections [1]. Values of a single property of many events (e.g., pt for events 1 to 1000) are stored consecutively on disk. Only those parts that are required for an analysis need to be read. The RNTuple classes provide a new, experimental columnar event I/O system that is backwards-incompatible to TTree both on the file format level and on the API level. This section describes key design choices of the RNTuple data format and of class design and the interfaces

Data layout
Class design
RNTuple user interfaces
Performance evaluation
Storage efficiency
SSD optimizations
Optane DC NV-RAM evaluation
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call