Performance Analysis of Hadoop YARN Job Schedulers in a Multi-Tenant Environment on HiBench Benchmark Suite

Kamalakant Laxman Bawankule,Anil Kumar Singh,Rupesh Kumar Dewang

doi:10.4018/ijdst.2021070104

Kamalakant Laxman Bawankule, Anil Kumar Singh + Show 1 more

https://doi.org/10.4018/ijdst.2021070104

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Big data processing technology marks a prominent place in today's market. Hadoop is an efficient open-source distributed framework used to process big data with fewer expenses utilizing a cluster of commodity machines (nodes). In Hadoop, YARN got introduced for effective resource utilization among the jobs. Still, YARN over-allocates the resources for some tasks of a job and keeps the cluster resources underutilized. This paper has investigated the CAPACITY and FAIR schedulers' practical utilization of resources in a multi-tenancy shared environment using the HiBench benchmark suite. It compares the above MapReduce job schedulers' performance in two scenarios and proposes some open research questions (ORQ) with potential solutions to help the upcoming researchers. On average, the authors found that CAPACITY and FAIR schedulers utilize 77% of RAM and 82% of CPU cores. Finally, the experimental evaluation proves that these schedulers over-allocate the resources for some of the tasks and keep the cluster resources underutilized in different scenarios.

Full Text