YARN Schedulers for Hadoop MapReduce Jobs: Design Goals, Issues and Taxonomy

Gnanendra Kotikam,S Lokesh

doi:10.2174/2666255816666220831125012

Abstract

Objective: Big Data processing is a demanding task, and several big data processing frameworks have emerged during recent decades. The performance of these frameworks greatly dependent on resource management models. Methods: YARN is one of such models which acts as a resource management layer and provides computational resources for execution engines (Spark, MapReduce, storm, etc.) through its schedulers. The most important aspect of resource management is job scheduling. Results: In this paper, we first present the design goal of YARN real-life schedulers (FIFO, Capacity, and Fair) for the MapReduce engine. Later, we discuss the scheduling issues of the Hadoop MapReduce cluster. Conclusion: Many efforts have been carried out in the literature to address issues of data locality, heterogeneity, straggling, skew mitigation, stragglers and fairness in Hadoop MapReduce scheduling. Lastly, we present the taxonomy of different scheduling algorithms available in the literature based on some factors like environment, scope, approach, objective and addressed issues.

Full Text