Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

A Klimentov,P Nilsson,R J Porter,T Maeno,K De,J C Wells,S Panitkin,D Oleynik,A Vaniachine,P Buncic,K F Read,R Mount,A Petrosyan,T Wenaus,S Jha

doi:10.1088/1742-6596/608/1/012040

Abstract

The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(102) sites, O(105) cores, O(108) jobs per year, O(103) users, and ATLAS data volume is O(1017) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled ‘Next Generation Workload Management and Analysis System for Big Data’ (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.

Highlights

The largest scientific instrument in the world, the Large Hadron Collider, is operating at the CERN Laboratory in Geneva, Switzerland [1]
The ATLAS [2], ALICE [3] and other Large Hadron Collider (LHC) experiments explore the fundamental nature of matter and the basic forces that shape our universe
To address an unprecedented multi-petabyte data processing challenge, the LHC experiments rely on the computational grids infrastructure deployed in the framework of the Worldwide LHC Computing Grid (WLCG) [4]

Summary

Home Search Collections Journals About Contact us My IOPscience

Generation Workload Management System For Big Data on Heterogeneous Distributed Computing. This content has been downloaded from IOPscience. Please scroll down to see the full text. Ser. 608 012040 (http://iopscience.iop.org/1742-6596/608/1/012040) View the table of contents for this issue, or go to the journal homepage for more. Download details: IP Address: 131.169.4.70 This content was downloaded on 26/05/2015 at 20:56 Please note that terms and conditions apply. A Klimentov, P Buncic, K De3, S Jha, T Maeno, R Mount, P.Nilsson, D Oleynik, S Panitkin, A Petrosyan, R J Porter, K F Read, A Vaniachine, J C Wells and T Wenaus

Introduction

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Apr 1, 2015
Citations: 22	License type: cc-by

R Discovery Prime

R Discovery Prime

Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

Integration of Panda Workload Management System with supercomputers
K De ... T Maeno
Physics of Particles and Nuclei Letters | VOL. 13
K De, et. al.K De ... T Maeno
01 Sep 2016
Physics of Particles and Nuclei Letters | VOL. 13

Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science
A Klimentov ... D Oleynik
Journal of Physics: Conference Series | VOL. 762
A Klimentov, et. al.A Klimentov ... D Oleynik
01 Oct 2016
Journal of Physics: Conference Series | VOL. 762

Integration of PanDA workload management system with Titan supercomputer at OLCF
K De ... A Klimentov
Journal of Physics: Conference Series | VOL. 664
K De, et. al.K De ... A Klimentov
01 Dec 2015
Journal of Physics: Conference Series | VOL. 664

Integration of Titan supercomputer at OLCF with ATLAS Production System
F Barreiro Megino ... T Maeno
Journal of Physics: Conference Series | VOL. 898
F Barreiro Megino, et. al.F Barreiro Megino ... T Maeno
01 Oct 2017
Journal of Physics: Conference Series | VOL. 898

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series