Predicting Workflow Task Execution Time in the Cloud Using A Two-Stage Machine Learning Approach

Thanh-Phuong Pham,Thomas Fahringer,Juan J Durillo

doi:10.1109/tcc.2017.2732344

Thanh-Phuong Pham, Thomas Fahringer + Show 1 more

Open Access

https://doi.org/10.1109/tcc.2017.2732344

Copy DOI

Journal: IEEE Transactions on Cloud Computing	Publication Date: Jan 1, 2020
Citations: 103	License type: cc-by

Affiliation: Universität Innsbruck

Abstract

Many techniques such as scheduling and resource provisioning rely on performance prediction of workflow tasks for varying input data. However, such estimates are difficult to generate in the cloud. This paper introduces a novel two-stage machine learning approach for predicting workflow task execution times for varying input data in the cloud. In order to achieve high accuracy predictions, our approach relies on parameters reflecting runtime information and two stages of predictions. Empirical results for four real world workflow applications and several commercial cloud providers demonstrate that our approach outperforms existing prediction methods. In our experiments, our approach respectively achieves a best-case and worst-case estimation error of 1.6 and 12.2 percent, while existing methods achieved errors beyond 20 percent (for some cases even over 50 percent) in more than 75 percent of the evaluated workflow tasks. In addition, we show that the models predicted by our approach for a specific cloud can be ported with low effort to new clouds with low errors by requiring only a small number of executions.

Highlights

THE cloud computing paradigm offers various advantages for scientific applications, including rapid provisioning of resources, pay-per-use and elasticity of a flexible amount of resources
We propose a performance prediction method that falls into the first category of analytically modeling to predict the execution time of workflow tasks for clouds
We consider two ensemble methods called Bagging [27] and Random Forest [3]. The former has been already applied to performance prediction in the Cloud [17]

Summary

Introduction

THE cloud computing paradigm offers various advantages for scientific applications, including rapid provisioning of resources, pay-per-use and elasticity of a flexible amount of resources. Workflow applications [1] consist of a possible large number of components, known as workflow tasks, such as legacy programs, data analysis or computational methods, complex simulations or even smaller subworkflows. These components are connected by data and control flow dependencies. A crucial aspect for scientific workflows is the effective optimization of runtimes, resource usage and economic costs These goals can be achieved through the use of different techniques; in particular, scheduling or determining the resource on where to execute each workflow task and resource-provisioning that determines how many resources of which type are needed [2]. Cloud infrastructures offer a wide variety of computing resources, execution times may only be known for a subset of cloud providers and for a restricted set of workflow input data

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predicting Workflow Task Execution Time in the Cloud Using A Two-Stage Machine Learning Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing

Lead the way for us

Similar Papers

Bi-Criteria Priority based Particle Swarm Optimization workflow scheduling algorithm for cloud
Amandeep Verma ... Sakshi Kaushal
-
Amandeep Verma, et. al.Amandeep Verma ... Sakshi Kaushal
01 Mar 2014
01 Mar 2014

An advanced machine learning approach for high accuracy automated diagnosis of otitis media with effusion in different age groups using 3D wideband acoustic immittance
Emad M Grais ... Fei Zhao
Biomedical Signal Processing and Control | VOL. 87
Emad M Grais, et. al.Emad M Grais ... Fei Zhao
03 Oct 2023
Biomedical Signal Processing and Control | VOL. 87

Abstract Workflow Description Language
Jun Qin ... Thomas Fahringer
-
Jun Qin, et. al.Jun Qin ... Thomas Fahringer
01 Jan 2012
01 Jan 2012

Data-driven dimensional analysis of critical heat flux in subcooled vertical flow: A two-stage machine learning approach
Kuang Yang ... Haijun Wang
Applied Thermal Engineering | VOL. 248
Kuang Yang, et. al.Kuang Yang ... Haijun Wang
11 Apr 2024
Applied Thermal Engineering | VOL. 248

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting Workflow Task Execution Time in the Cloud Using A Two-Stage Machine Learning Approach

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing