An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

Abdol Karim Javanmardi,S Hadi Yaghoubyan,Karamollah Bagherifard,Samad Nejatian,Hamid Parvin

doi:10.1007/s11227-020-03487-5

Abdol Karim Javanmardi, S Hadi Yaghoubyan + Show 3 more

https://doi.org/10.1007/s11227-020-03487-5

Copy DOI

Abstract

Job scheduling in Hadoop has been thus far investigated in several studies. However, some challenges including minimum share (min-share), heterogeneous cluster, execution time estimation, and scheduling program size facing Hadoop clusters have received less attention. Accordingly, one of the most important algorithms with regard to min-share is that presented by Facebook Inc., i.e., FAIR scheduler, based on its own needs, in which an equal min-share has been considered for users. In this article, an attempt has been made to make the proposed method superior to existing methods through automation and configuration, performance optimization, fairness and data locality. A high-level architectural model is designed. Then a scheduler is defined on this architectural model. The provided scheduler contains four components. Three components schedule jobs and one component distributes the data for each job among the nodes. The given scheduler will be capable of being executed on heterogeneous Hadoop clusters and running jobs in parallel, in which disparate min-shares can be assigned to each job or user. Moreover, an approach is presented for each problem associated with min-share, cluster heterogeneity, execution time estimation, and scheduler program size. These approaches can be also utilized on its own to improve the performance of other scheduling algorithms. The scheduler presented in this paper showed acceptable performance compared with First-In, First-Out (FIFO), and FAIR schedulers.

Full Text