Improved Job Scheduling for Achieving Fairness on Apache Hadoop YARN

Wint Thida Zaw,Thet Hsu Aung

doi:10.1109/icait51105.2020.9261793

Abstract

Enormous amounts of data are gathered from social media sites, mobile and other business environment. Analyzing the enormous amounts of big data becomes large workloads with distributed applications and the resources of a single machine are insufficient for this application. Hadoop YARN (Yet Another Resource Negotiator) enables running multiple applications over hadoop cluster to utilize the resources efficiently and provide the data parallel programming model. Hadoop YARN breaks up the performance of open source framework for distributed applications and performs job scheduling and monitoring together with storage, processing and analysis of big data on commodity hardware. Apache Hadoop provides for over 200 default parameter configuration settings for all type of clusters and applications. Of If the available parameters misconfigure, the one or more machines in the cluster may decrease the system performance. Appropriate tuning parameters configuration can increase the system performance. Tuning parameter configuration becomes the challenge of Apache Hadoop Framework for utilization of system resources efficiently. In this paper, YARN parameters tuning is done for improving the execution time and efficient job scheduling.

Full Text