Abstract

Improving reliability is one of the major concerns of scientific workflow scheduling in clouds. The ever-growing computational complexity and data size of workflows present challenges to fault-tolerant workflow scheduling. Therefore, it is essential to design a cost-effective fault-tolerant scheduling approach for large-scale workflows. In this paper, we propose a dynamic fault-tolerant workflow scheduling (DFTWS) approach with hybrid spatial and temporal re-execution schemes. First, DFTWS calculates the time attributes of tasks and identifies the critical path of workflow in advance. Then, DFTWS assigns appropriate virtual machine (VM) for each task according to the task urgency and budget quota in the phase of initial resource allocation. Finally, DFTWS performs online scheduling, which makes real-time fault-tolerant decisions based on failure type and task criticality throughout workflow execution. The proposed algorithm is evaluated on real-world workflows. Furthermore, the factors that affect the performance of DFTWS are analyzed. The experimental results demonstrate that DFTWS achieves a trade-off between high reliability and low cost objectives in cloud computing environments.

Highlights

  • In recent years, scientific workflow has been applied widely as a new paradigm of data analysis and scientific computation [1]

  • Execution time of ti communication time between ti and t j critical path budget quota of ti reliability of ti with spatial re-execution (SRE) scheme reliability of ti with temporal re-execution (TRE) scheme type of virtual machine (VM) selected by ti price of V MT

  • Case 1: Instance Start Time (IST) − rt ≤ failure occurrence times (FOT) ≤ maxts ∈succ(ti ) ( AEET + CTis ), the failure occurs during the execution or data transmission of ti, that is, the transient failure recovers after the instance of ti starts, and before all data transfer from ti to ts finish

Read more

Summary

Introduction

Scientific workflow has been applied widely as a new paradigm of data analysis and scientific computation [1]. Workflows can be deployed and executed in clouds that provide a virtually infinite resource pool in a pay-as-you-go manner [5] In this way, workflows can acquire and release cloud resources on-demand to achieve a cost-effective operating mode. Workflows can acquire and release cloud resources on-demand to achieve a cost-effective operating mode These advantages enable clouds to become a preferred execution environment for scientific workflows. Without an effective fault-tolerant scheduling scheme, failures will cause deadline-aware workflows cannot complete on time. In this situation, the QoS is severely affected, the results might be obtained after the deadline. We propose a dynamic fault-tolerant workflow scheduling with hybrid spatial-temporal re-execution, called DFTWS.

Related Work
Preliminaries
Cloud System
Workflow Model
Fault Tolerance Schemes
Failure Model
Cost Model
DFTWS Algorithm
Static Node Information Calculation
Critical Path Identification
Initial Resource Allocation
Online Scheduling
Experimental Setup
Impact of DM
Impact of FR
Impact of Workflow Structure
Experimental Summary
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call