Using Hadoop for High Energy Physics Data Analysis

Qiulan Huang,Gongxing Sun,Zhenjing Cheng,Yaodong Cheng,Zhanchen Wei,Qingbao Hu

doi:10.1007/978-3-030-28061-1_16

Abstract

With the development of the new generation of High Energy Physics (HEP) experiments, huge amounts of data are being generated. Efficient parallel algorithms/frameworks and High IO throughput are key to meet the scalability and performance requirements of HEP offline data analysis. Though Hadoop has gained a lot of attention from scientific community for its scalability and parallel computing framework for large data sets, it’s still difficult to make HEP data processing tasks run directly on Hadoop. In this paper we investigate the application of Hadoop to make HEP jobs run on it transparently. Particularly, we discuss a new mechanism to support HEP software to random access data in HDFS. Because HDFS is streaming data stored only supporting sequential write and append. It cannot satisfy HEP jobs to random access data. This new feature allows the Map/Reduce tasks to random read/write on the local file system on data nodes instead of using Hadoop data streaming interface. This makes HEP jobs run on Hadoop possible. We also develop diverse MapReduce model for HEP jobs such as Corsika simulation, ARGO detector simulation and Medea++ reconstruction. And we develop a toolkit for users to submit/query/remove jobs. In addition, we provide cluster monitoring and account system to benefit to the system availability. This work has been in production for HEP experiment to gain about 40,000 CPU hours per month since September, 2016.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using Hadoop for High Energy Physics Data Analysis

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Striped Data Server for Scalable Parallel Data Analysis
Jin Chang ... James Pivarski
Journal of Physics: Conference Series | VOL. 1085
Jin Chang, et. al.Jin Chang ... James Pivarski
01 Sep 2018
Journal of Physics: Conference Series | VOL. 1085

Striped Data Analysis Framework
Oliver Gutsche ... D Kim
EPJ Web of Conferences | VOL. 245
Oliver Gutsche, et. al.Oliver Gutsche ... D Kim
01 Jan 2020
EPJ Web of Conferences | VOL. 245

Spark and HPC for High Energy Physics Data Analyses
Saba Sehrish ... Jim Kowalkowski
-
Saba Sehrish, et. al.Saba Sehrish ... Jim Kowalkowski
01 May 2017
01 May 2017

HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
Lothar Bauerdick
-
Lothar BauerdickLothar Bauerdick
09 Apr 2018
09 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Hadoop for High Energy Physics Data Analysis

Abstract

Talk to us

Similar Papers