A low redundancy and high time efficiency large-scale task assignment strategy for heterogeneous service-oriented cloud computing systems

Jiang Zhu,Lizan Wang,Sangyoon Oh,Tingrui Pei,Zhetao Li,Guoqi Xie

doi:10.1007/s11227-020-03403-x

Abstract

With a large number of heterogeneous processors are deployed on service-oriented cloud computing systems, the issue of processor random hardware failure is becoming increasingly prominent. Replication-based fault-tolerance task assignment is a common approach to satisfy application’s reliability requirement. However, the state-of-the-art algorithms have either high redundancy or low time efficiency. In this work, we propose a fast task assignment for minimizing redundancy (FTAMR) algorithm to satisfy reliability requirement for a directed acyclic graph-based parallel application on heterogeneous service-oriented cloud computing systems. Firstly, the FTAMR algorithm fast identifies tasks which need to be replicated. Secondly, the FTAMR algorithm fast maps selected tasks to their respective most suitable processors. Then, the FTAMR algorithm repeats above steps until application’s reliability satisfies established reliability requirement. Experimental results on real and synthetic generated parallel applications at different scales, parallelism, and heterogeneity show that the FTAMR algorithm can generate minimum redundancy and maximum time efficiency compared with the state-of-the-art fault-tolerance algorithms.

Full Text