A Selective Mirrored Task Based Fault Tolerance Mechanism for Big Data Application Using Cloud

Hao Wu,Qinggeng Jin,He Guo,Chenghua Zhang

doi:10.1155/2019/4807502

Abstract

With the wide deployment of cloud computing in big data processing and the growing scale of big data application, managing reliability of resources becomes a critical issue. Unfortunately, due to the highly intricate directed-acyclic-graph (DAG) based application and the flexible usage of processors (virtual machines) in cloud platform, the existing fault tolerant approaches are inefficient to strike a balance between the parallelism and the topology of the DAG-based application while using the processors, which causes a longer makespan for an application and consumes more processor time (computation cost). To address these issues, this paper presents a novel fault tolerant framework named Fault Tolerance Algorithm using Selective Mirrored Tasks Method (FAUSIT) for the fault tolerance of running a big data application on cloud. First, we provide comprehensive theoretical analyses on how to improve the performance of fault tolerance for running a single task on a processor. Second, considering the balance between the parallelism and the topology of an application, we present a selective mirrored task method. Finally, by employing the selective mirrored task method, the FAUSIT is designed to improve the fault tolerance for DAG based application and incorporates two important objects: minimizing the makespan and the computation cost. Our solution approach is evaluated through rigorous performance evaluation study using real-word workflows, and the results show that the proposed FAUSIT approach outperforms existing algorithms in terms of makespan and computation cost.

Highlights

Recent years have witnessed that the big data analysis grows dramatically, and the related applications have been used everywhere in both academia [1] and industry [2]
Based on the basic idea above, we propose the FAUSIT to improve the fault tolerance for executing a large-scale big data application; the pseudocode of the FAUSIT is listed in Algorithm 1
This paper investigates the problem of improving the fault tolerant for a big data application running on cloud

Summary

Introduction

Recent years have witnessed that the big data analysis grows dramatically, and the related applications have been used everywhere in both academia [1] and industry [2]. There is no denying that the developmental cloud platform technologies played a key role in this process; the plenty of processers in cloud make sure that the scholars can handle the significant large-scale big data processing [3,4,5,6]. The abundant use of processers by a big data application induces that the probability cannot be ignored. Motivated by the reasonable price, rapid elasticity, and shifting responsibility of maintenance, backups, and management to cloud providers, more and more big data applications have been deployed to clouds, such as EC2 [11], Google Cloud [12], and Microsoft Azure [13]

Objectives

Results

Conclusion