Laurelin: Java-native ROOT I/O for Apache Spark

Andrew Melo,Oksana Shadura,For The Cms Collaboration

doi:10.1051/epjconf/202125102072

Andrew Melo, Oksana Shadura + Show 1 more

Open Access

https://doi.org/10.1051/epjconf/202125102072

Copy DOI

Abstract

Apache Spark[1] is one of the predominant frameworks in the big data space, providing a fully-functional query processing engine, vendor support for hardware accelerators, and performant integrations with scientific computing libraries. One difficulty in adopting conventional big data frameworks to HEP workflows is the lack of support for the ROOT file format in these frameworks. Laurelin[6] implements ROOT I/O with a pure Java library, with no bindings to the C++ ROOT[2] implementation, and is readily installable via standard Java packaging tools. It provides a performant interface enabling Spark to read (and soon write) ROOT TTrees, enabling users to process these data without a pre-processing phase converting to an intermediate format.

Highlights

High Energy Physics (HEP) experiments, like those performed at the Large Hadron Collider (LHC), generate an enormous amount of data which must be captured, stored, refined, and later analyzed by thousands of users
Reconstruction passes are performed over the initial data to ‘connect the dots’ of detector hits to determine the most probable types and trajectories of particles passing through the detector. This process results in datasets that are apt to probe the physical processes that occurred at the interaction point
When a user requests a new DataFrame backed by ROOT files, Spark will greedily instantiate a ‘Table’ class to represent the dataset backed by those files

Summary

Introduction

High Energy Physics (HEP) experiments, like those performed at the Large Hadron Collider (LHC), generate an enormous amount of data which must be captured, stored, refined, and later analyzed by thousands of users These data are produced from the collisions of billions of subatomic particles at speeds approaching the speed of light, as well as sophisticated simulations of the various expected physical processes. Reconstruction passes are performed over the initial data to ‘connect the dots’ of detector hits to determine the most probable types and trajectories of particles passing through the detector This process results in datasets that are apt to probe the physical processes that occurred at the interaction point. Because many tools like Spark were developed outside of HEP, there is little native support for the ROOT file format in the Big Data ecosystem. In HEP, this is the exception rather than the rule because most analyses will focus on a small fraction of the particles from any given Event

Architecture

ROOT file format

Apache Spark DataSource API

Components

Interpretation

Spark DataSource

Conclusions and Next Steps

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Laurelin: Java-native ROOT I/O for Apache Spark

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ web of conferences

Lead the way for us

Journal: EPJ web of conferences	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

Autonomic Performance Optimization of Big Data Workloads
Michael Genkin
-
Michael GenkinMichael Genkin
12 Jul 2021
12 Jul 2021

Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark
Neha Bharill ... Aruna Tiwari
-
Neha Bharill, et. al.Neha Bharill ... Aruna Tiwari
01 Mar 2016
01 Mar 2016

Benchmarking Big Data Systems and the BigData Top100 List.
Chaitanya Baru ... Meikel Poess
Big data | VOL. 1
Chaitanya Baru, et. al.Chaitanya Baru ... Meikel Poess
01 Mar 2013
Big data | VOL. 1

Big data processing frameworks and architectures: a survey
Raghavendra Kumar Chunduri ... Aswani Kumar Cherukuri
-
Raghavendra Kumar Chunduri, et. al.Raghavendra Kumar Chunduri ... Aswani Kumar Cherukuri
07 Jul 2021
07 Jul 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Laurelin: Java-native ROOT I/O for Apache Spark

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ web of conferences