Abstract

MapReduce is a programming framework widely used in cloud computing environments for processing large amount of data in a highly parallel way. However, current MapReduce model do not cope well with its scalability, which means that under certain hardware configuration, it can only support limited scale of cluster due to the overloading of center node. In this paper, we present a prototype based on DHTs Peer-to-Peer MapReduce system, which removed the MapReduce task centralized scheduling's master node and bottom file system management's name node on the basis of remaining original MapReduce workflow unchanged. In the system, the distributed file system in bottom layer queries data through distributed hashing, while the MapReduce system in upper layer invoke and schedule the tasks by distributed notification mechanism. In this way, the system can theoretically achieve the scalability of Peer-to-Peer system. The scalability evaluation of the system has been experimented in the network scenarios using the prevailing word count problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call