Abstract
Big data processing technology marks a prominent place in today's market. Hadoop is an efficient open-source distributed framework used to process big data with fewer expenses utilizing a cluster of commodity machines (nodes). In Hadoop, YARN got introduced for effective resource utilization among the jobs. Still, YARN over-allocates the resources for some tasks of a job and keeps the cluster resources underutilized. This paper has investigated the CAPACITY and FAIR schedulers' practical utilization of resources in a multi-tenancy shared environment using the HiBench benchmark suite. It compares the above MapReduce job schedulers' performance in two scenarios and proposes some open research questions (ORQ) with potential solutions to help the upcoming researchers. On average, the authors found that CAPACITY and FAIR schedulers utilize 77% of RAM and 82% of CPU cores. Finally, the experimental evaluation proves that these schedulers over-allocate the resources for some of the tasks and keep the cluster resources underutilized in different scenarios.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have