Analyzing fault tolerance mechanism of Hadoop Mapreduce under different type of failures

Samadi Yassir,Zbakh Mostapha,Claude Tadonki

doi:10.1109/cloudtech.2018.8713332

Abstract

MapReduce is the most popular distributed paradigm thanks to its features such as fault tolerance for processing of large-scale data. Hadoop is considred as a widely used implementation of MapReduce, it provides an open-source solution for processing with big data. By using Hadoop, enterprises can process enormous volumes of data on a large clusters. However, when the size of clusters used for processing data increases, the system will experience more failures during execution of Mapreduce applications. The scheduler in Hadoop is responsible for scheduling and monitoring the jobs and tasks. In case of a task fails, Hadoop reschedule it. In addition, MapReduce introduces a novel data replication and task re-execution strategies for fault tolerance in order to meet the end users requirements. MapReduce's tasks are independent of each other which isolates the impact of failures to the single task. Furthermore, MapReduce replicates each data block and re-executes the failed tasks, which potentially avoids the data transfer and checkpointing overhead during the execution of tasks. This paper intends to lead a better understanding of fault tolerance mechanism of Hadoop Mapreduce despite failures. The paper focuses on evaluation of the performance of many representative Hadoop MapReduce applications, with different execution parameters as well as under different failure scenarios. We will also present different options to inject failures into MapReduce applications to simulate real world failures. To trigger failures in applications or systems, we use failure injection technique. It has been long used in computer design to test and evaluate error correction and failure management schemes. Finally, we will present the cause of failures and Hadoop MapReduce behaviors during the failed of job processing.

Full Text