Abstract

MapReduce is a framework for processing huge volumes of data in parallel, on large groups of nodes. Processing enormous data requires fast coordination and allocation of resources. Emphasis is on achieving maximum performance with optimal resources. This paper portraits a technique for accomplishing better resource utilization. The main objective of the work is to incorporate virtualization in Hadoop MapReduce framework and measuring the performance enhancement. In order to realize this master node is setup on physical machine and slave nodes are setup in a common physical machine as virtual machines (VM), by cloning of Hadoop configured VM images. To further enhance the performance Hadoop virtual cluster are configured to use capacity scheduler.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call