Applying MapReduce Framework to Peer-to-Peer Overlay Network

Pei Xiao,Xuejie Zhang,Jin Wang,Qiang Han,Jixian Zhang,Xiaolu Zhang

doi:10.1109/icss.2014.21

Abstract

MapReduce is a programming framework widely used in cloud computing environments for processing large amount of data in a highly parallel way. However, current MapReduce model do not cope well with its scalability, which means that under certain hardware configuration, it can only support limited scale of cluster due to the overloading of center node. In this paper, we present a prototype based on DHTs Peer-to-Peer MapReduce system, which removed the MapReduce task centralized scheduling's master node and bottom file system management's name node on the basis of remaining original MapReduce workflow unchanged. In the system, the distributed file system in bottom layer queries data through distributed hashing, while the MapReduce system in upper layer invoke and schedule the tasks by distributed notification mechanism. In this way, the system can theoretically achieve the scalability of Peer-to-Peer system. The scalability evaluation of the system has been experimented in the network scenarios using the prevailing word count problem.

Full Text