Monitoring data transfer latency in CMS computing operations

D Bonacorsi,A Sartirana,N Magini,M Taze,T Wildish,T Diotalevi

doi:10.1088/1742-6596/664/3/032033

Abstract

During the first LHC run, the CMS experiment collected tens of Petabytes of collision and simulated data, which need to be distributed among dozens of computing centres with low latency in order to make efficient use of the resources. While the desired level of throughput has been successfully achieved, it is still common to observe transfer workflows that cannot reach full completion in a timely manner due to a small fraction of stuck files which require operator intervention.For this reason, in 2012 the CMS transfer management system, PhEDEx, was instrumented with a monitoring system to measure file transfer latencies, and to predict the completion time for the transfer of a data set. The operators can detect abnormal patterns in transfer latencies while the transfer is still in progress, and monitor the long-term performance of the transfer infrastructure to plan the data placement strategy.Based on the data collected for one year with the latency monitoring system, we present a study on the different factors that contribute to transfer completion time. As case studies, we analyze several typical CMS transfer workflows, such as distribution of collision event data from CERN or upload of simulated event data from the Tier-2 centres to the archival Tier-1 centres. For each workflow, we present the typical patterns of transfer latencies that have been identified with the latency monitor.We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention. We propose a set of metrics to alert about stuck subscriptions and prompt for manual intervention, with the aim of improving transfer completion times.

Highlights

Monitoring data transfer latency in CMS computing operationsThis content has been downloaded from IOPscience
We identify the areas in PhEDEx where a development effort can reduce the latency, and we show how we are able to detect stuck transfers which need operator intervention
The CMS experiment [1] at the LHC accelerator is concluding the first Long Shutdown (LS1) after a successful first run of data taking (Run-1), with Run-2 starting in Summer 2015

Summary

Monitoring data transfer latency in CMS computing operations

This content has been downloaded from IOPscience. Please scroll down to see the full text. Ser. 664 032033 (http://iopscience.iop.org/1742-6596/664/3/032033) View the table of contents for this issue, or go to the journal homepage for more. Download details: IP Address: 137.138.125.164 This content was downloaded on 09/03/2016 at 09:00 Please note that terms and conditions apply. 21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015) IOP Publishing. Journal of Physics: Conference Series 664 (2015) 032033 doi:10.1088/1742-6596/664/3/032033

Introduction

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Physics: Conference Series	Publication Date: Dec 1, 2015
Citations: 3	License type: cc-by

R Discovery Prime

R Discovery Prime

Monitoring data transfer latency in CMS computing operations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series

Lead the way for us

Similar Papers

No file left behind - monitoring transfer latencies in PhEDEx
...
Journal of Physics: Conference Series | VOL. 396
, et. al. ...
13 Dec 2012
Journal of Physics: Conference Series | VOL. 396

Progress in Machine Learning Studies for the CMS Computing Infrastructure
Daniele Bonacorsi ... Tommaso Diotalevi
-
Daniele Bonacorsi, et. al.Daniele Bonacorsi ... Tommaso Diotalevi
06 Dec 2017
06 Dec 2017

Application of the M6T Tracker to Simulated and Experimental Multistatic Sonar Data
Pascal De Theije ... Leon Kester
-
Pascal De Theije, et. al.Pascal De Theije ... Leon Kester
01 Jul 2006
01 Jul 2006

Design and analysis of a mesh-based wireless network-on-chip
Wen-Hsiang Hu ... Chifeng Wang
The Journal of Supercomputing | VOL. 71
Wen-Hsiang Hu, et. al.Wen-Hsiang Hu ... Chifeng Wang
22 Nov 2014
The Journal of Supercomputing | VOL. 71

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Monitoring data transfer latency in CMS computing operations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Physics: Conference Series