Using Big Data Technologies for HEP Analysis

M Cremonesi ,Evangelos Motesnitsalis,Vasileios Dimakopoulos,Claudio Bellini,Bianny Bian,Luca Canali,P Elmer ,J Pazzini ,B Jayatilaka ,A Svyatkovskiy ,I Fisk ,Siew-Yan Hoh,M Girone ,J Pivarski ,A Melo ,D Olivito ,Andrea Luiselli,O Gutsche ,V Khristenko ,M Zanetti

doi:10.1051/epjconf/201921406030

Abstract

The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially become a limiting factor in the capability to produce scientific results timely and efficiently. Recently, new technologies and new approaches have been developed in industry to answer to the necessity to retrieve information as quickly as possible to analyze PB and EB datasets. Providing the scientists with these modern computing tools will lead to rethinking the principles of data analysis in HEP, making the overall scientific process faster and smoother. In this paper, we are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The study aims at evaluating the efficiency of the application of the new tools both quantitatively, by measuring the performances, and qualitatively, focusing on the user experience. The first goal is achieved by developing a data reduction facility: working together with CERN Openlab and Intel, CMS replicates a real physics search using Spark-based technologies, with the ambition of reducing 1 PB of public data in 5 hours, collected by the CMS experiment, to 1 TB of data in a format suitable for physics analysis. The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation. By performing different end-analyses up to the publication plots on different hardware, feasibility, usability and portability are compared to the ones of a traditional ROOT-based workflow.

Highlights

The scientific method is based on comparing predictions to experimental data, in order to confirm or disprove new theories
In high energy physics (HEP), such data are collected by an experimental apparatus that can detect fundamental particles once they are produced in the collision of beams provided by accelerators like the LHC at CERN
We focus on the application of Apache Spark [1] to the HEP analysis problem

Summary

Introduction

The scientific method is based on comparing predictions to experimental data, in order to confirm or disprove new theories. We are presenting the latest developments and the most recent results on the usage of Apache Spark for HEP analysis. The second goal is achieved by implementing multiple physics use-cases in Apache Spark using as input preprocessed datasets derived from official CMS data and simulation.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EPJ Web of Conferences	Publication Date: Jan 1, 2019
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using Big Data Technologies for HEP Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences

Lead the way for us

Similar Papers

CMS Analysis and Data Reduction with Apache Spark
Oliver Gutsche ... Peter Elmer
Journal of Physics: Conference Series | VOL. 1085
Oliver Gutsche, et. al.Oliver Gutsche ... Peter Elmer
01 Sep 2018
Journal of Physics: Conference Series | VOL. 1085

Big Data in HEP: A comprehensive use case study
Oliver Gutsche ... Nhan Tran
Journal of Physics: Conference Series | VOL. 898
Oliver Gutsche, et. al.Oliver Gutsche ... Nhan Tran
01 Oct 2017
Journal of Physics: Conference Series | VOL. 898

Performance Comparison of State of Art NoSql Technologies Using Apache Spark
Anwar Ul Haque ... Nassar Ikram
-
Anwar Ul Haque, et. al.Anwar Ul Haque ... Nassar Ikram
08 Nov 2018
08 Nov 2018

CERN openlab: Engaging industry for innovation in the LHC Run 3-4 R&D programme
M Girone ... A Di Meglio
Journal of Physics: Conference Series | VOL. 898
M Girone, et. al.M Girone ... A Di Meglio
01 Oct 2017
Journal of Physics: Conference Series | VOL. 898

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Big Data Technologies for HEP Analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ Web of Conferences