A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud Environments

Fabrizio Marozzo,Domenico Talia,Paolo Trunfio

doi:10.1007/978-1-84996-241-4_7

Abstract

MapReduce is a programming model widely used in Cloud computing environments for processing large data sets in a highly parallel way. MapReduce implementations are based on a master-slave model. The failure of a slave is managed by re-assigning its task to another slave, while master failures are not managed by current MapReduce implementations, as designers consider failures unlikely in reliable Cloud systems. On the contrary, node failures – including master failures – are likely to happen in dynamic Cloud scenarios, where computing nodes may join and leave the network at an unpredictable rate. Therefore, providing effective mechanisms to manage master failures is fundamental to exploit the MapReduce model in the implementation of data-intensive applications in those dynamic Cloud environments where current MapReduce implementations could be unreliable. The goal of our work is to extend the master-slave architecture of current MapReduce implementations to make it more suitable for dynamic Cloud scenarios. In particular, in this chapter, we present a Peer-to-Peer (P2P)-MapReduce framework that exploits a P2P model to manage participation of intermittent nodes, master failures, and MapReduce job recovery in a decentralized but effective way.

Full Text