Forecasting network throughput of remote data access in computing grids

Volodimir Begy,Martin Barisits,Mario Lassnig,Erich Schikuta

doi:10.1016/j.jocs.2020.101158

Volodimir Begy, Martin Barisits + Show 2 more

Open Access

PDF Available

https://doi.org/10.1016/j.jocs.2020.101158

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Computing grids are key enablers of computational science. Researchers from many fields (High Energy Physics, Bioinformatics, Climatology, etc.) employ grids for execution of distributed computational jobs. These computing workloads are typically data-intensive. The current state of the art approach for data access in grids is data placement: a job is scheduled to run at a specific data center, and its execution commences only once the complete input data has been transferred there. An alternative approach is remote data access: a job may stream the input data directly from arbitrary storage elements. Remote data access brings two innovative benefits: (1) the jobs can be executed asynchronously with respect to the data transfer; (2) when combined with data placement on the policy level, it can aid in the optimization of the network load, since these two data access methodologies partially exhibit nonoverlapping bottlenecks. However, in order to employ this technique systematically, the properties of its network throughput need to be studied carefully. This paper presents experimentally identified parameters of remote data access throughput, statistically tested formalization of these parameters and a derived throughput forecasting model. The model is applicable to large computing workloads, robust with respect to arbitrary dynamic changes in the grid infrastructure and exhibits a long-term prediction horizon. Its purpose is to assist various stakeholders of the grid in decision-making related to data access patterns. This work is based on measurements taken on the Worldwide LHC Computing Grid at CERN.

Highlights

In this paper we have demonstrated that the network throughput of remote data access in computing grids can be framed as a multiple linear regression
The regression needs to be fit for each worker node - storage element pair
The estimates of the regression coefficients can be mined from logs in form of time series

Summary

Motivation

Analyze these data in a highly distributed and parallel fashion. For example, within the World-Wide LHC Computing Grid (WLCG) more than 150 computing sites are employed by the ATLAS experiment at CERN. The job may commence its execution only after the completion of the following workflow: (1) the input replica is transferred from the relevant storage element at DC2 to a storage element at DC1; (2) the input replica is staged-in from the relevant storage element at DC1 to the worker node This simplistic approach has two major disadvantages: (1) the jobs are staying idle while waiting for the input data; (2) due to the limited infrastructure resources the distributed data management system handling the data placement may queue the transfers up to several days. In another example, the input replica is located at the local storage element. A forecasting model of the network throughput is needed for coordination of scientific workloads

Related Work

Parameters of Remote Data Access Throughput

Forecasting Model

Input Features

Benchmark of Forecasting Models

Hidden

Conclusions and Future Work

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational Science	Publication Date: Jun 15, 2020
Citations: 4	License type: cc-by

R Discovery Prime

Forecasting network throughput of remote data access in computing grids

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Journal of Computational Science

Lead the way for us

Similar Papers

Forecasting Network Throughput of Remote Data Access in Computing Grids
Volodimir Begy ... Martin Barisits
-
Volodimir Begy, et. al.Volodimir Begy ... Martin Barisits
01 Jan 2019
01 Jan 2019

Simulating Data Access Profiles of Computational Jobs in Data Grids
Volodimir Begy ... Joeri Hermans
-
Volodimir Begy, et. al.Volodimir Begy ... Joeri Hermans
01 Sep 2019
01 Sep 2019

MzServer: Web-based Programmatic Access for Mass Spectrometry Data Analysis
Manor Askenazi ... James T Webber
Molecular & Cellular Proteomics | VOL. 10
Manor Askenazi, et. al.Manor Askenazi ... James T Webber
25 Jan 2011
Molecular & Cellular Proteomics | VOL. 10

Protecting Outsourced Data Privacy with Lifelong Policy Carrying
Xiaoguang Wang ... Jianbao Ren
-
Xiaoguang Wang, et. al.Xiaoguang Wang ... Jianbao Ren
01 Nov 2013
01 Nov 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Forecasting network throughput of remote data access in computing grids

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Journal of Computational Science