Maximizing reliability and performance with reliability-driven task scheduling in heterogeneous distributed computing systems

Hui Wang,Yun Wang

doi:10.1007/s12652-018-0926-9

Abstract

Computing machines and communication links may fail permanently with nonzero probability in heterogeneous distributed computing systems (HDCSs), and the results of running applications (i.e., large-scale parallel image processing and neuroimaging) on these systems will expect to deteriorate over time. Therefore, the reliability and performance of applications on HDCS remain an imperative and open issue, especially when the parallel applications are scheduled on graphics processing unit architectures. It is urgent to tackle the problem of maximizing performance and reliability considering the impact of communication and machine failures. This work presents a rigorous probabilistic theory to analytically characterize the performance and reliability of an effective task scheduling in the presence of processor and communication failure. An optimal communication path search algorithm considering Reliability overhead and a reliability-driven lookahead scheduling algorithm for precedence constrained tasks are developed. The theoretical model and experimental data, which are based on randomly generated emulation applications represented by directed acyclic graph, reveal that the proposed algorithms significantly outperform previously existing scheduling algorithms in terms of expected makespan, reliability, and schedule length ratio. The weaknesses of the algorithms related to the input parameters are also observed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Maximizing reliability and performance with reliability-driven task scheduling in heterogeneous distributed computing systems

Abstract

Talk to us

Similar Papers

More From: Journal of Ambient Intelligence and Humanized Computing

Lead the way for us

Journal: Journal of Ambient Intelligence and Humanized Computing	Publication Date: Jun 27, 2018
Citations: 7

Similar Papers

Reliability-aware scheduling strategy for heterogeneous distributed computing systems
Xiaoyong Tang ... Bharadwaj Veeravalli
Journal of Parallel and Distributed Computing | VOL. 70
Xiaoyong Tang, et. al.Xiaoyong Tang ... Bharadwaj Veeravalli
31 May 2010
Journal of Parallel and Distributed Computing | VOL. 70

An Empirical Study of Task Scheduling Strategies for Image Processing Application on Heterogeneous Distributed Computing System
...
Scalable Computing Practice and Experience | VOL. 3
, et. al. ...
01 Jan 1999
Scalable Computing Practice and Experience | VOL. 3

A high performance algorithm for static task scheduling in heterogeneous distributed computing systems
Mohammad I Daoud ... Nawwaf Kharma
Journal of Parallel and Distributed Computing | VOL. 68
Mohammad I Daoud, et. al.Mohammad I Daoud ... Nawwaf Kharma
28 Jul 2007
Journal of Parallel and Distributed Computing | VOL. 68

An Efficient Genetic Algorithm for Task Scheduling in Heterogeneous Distributed Computing Systems
M.I Daoud ... N Kharma
-
M.I Daoud, et. al.M.I Daoud ... N Kharma
11 Sep 2006
11 Sep 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Maximizing reliability and performance with reliability-driven task scheduling in heterogeneous distributed computing systems

Abstract

Talk to us

Similar Papers

More From: Journal of Ambient Intelligence and Humanized Computing