Abstract

MapReduce is one of the most popular programming models for parallel data processing in Cloud environments. Standard MapReduce implementations are based on centralized master-slave architectures that do not cope well with dynamic Cloud environments in which nodes may join and leave the network at high rates. In this chapter we describe P2P-MapReduce, a framework that exploits a peer-to-peer (P2P) model to manage intermittent node participation, master failures, and MapReduce job recovery in a decentralized but effective way. Specifically, the chapter describes the P2P-MapReduce architecture, mechanisms, and implementation and provides an evaluation of its performance. The performance results confirm that P2P-MapReduce ensures a higher level of fault tolerance compared to a centralized implementation of MapReduce.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call