LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters

Ihsan Ullah,Junsu Kim,Muhammad Sajjad Khan,Muhammad Amir,Su Min Kim

doi:10.1109/access.2020.3002565

Abstract

Big data refers to numerous forms of complex and large datasets which need distinctive computational platforms in order to be analyzed. Hadoop is one of the popular frameworks for analytics of big data. In Hadoop, a big job is split into multiple small tasks and then they are distributed to worker nodes in a parallel way using MapReduce to speed up computational processes. In this aspect, it is important how to improve throughput performance. MapReduce jobs require quick responses from the worker nodes to complete them under their deadlines. The existing scheduling schemes for Hadoop such as FIFO, fair, and capacity schedulers cannot guarantee the quick response requirement satisfying a prior deadline. Thus, Hadoop system needs to improve response time and completion time for the heterogeneous MapReduce jobs. In this paper, we propose an efficient preemptive deadline constraint scheduler based on least slack time and data locality. In order for better allocation of tasks and load balancing, we first analyze the task scheduling behaviors of the Hadoop platform. Based on that, we propose a novel preemptive approach which considers the remaining execution time of the job being executed in deciding preemption. The experimental results show that the proposed scheme significantly reduces the job execution time and queue waiting time, compared to existing schemes.

Highlights

In recent years, cloud computing and big data have attracted the researchers’ attention
We present a preemptive approach for effectively scheduling the jobs so that the total completion time of the jobs is reduced under given deadlines and least slack time
We present the existing schedulers for Hadoop to schedule the submitted MapReduce jobs based on their requirements and available resource in a computing cluster. fair scheduler [27] was proposed to assign average amount of resources to the jobs to be on shared all the jobs over time

Summary

INTRODUCTION

Cloud computing and big data have attracted the researchers’ attention. Hadoop is a distributed computing framework based on the MapReduce model that runs applications on a cluster of a large number of commodities and inexpensive computing nodes It is developed by Google in 2004 to handle big data applications by parallel processing. The proposed scheme in this paper attempts to solve these issues by focusing on meeting the deadlines of the jobs in a shared computing environment This requires accurate estimation of the map and reduce task computation time. Dynamic workloads scheduling with queue-wise preemption based on the priority of jobs to maximize the resource utilization of a Hadoop cluster. Developing a multi-server queuing model applicable to the proposed scheme to improve the schedulability process of MapReduce jobs under different constraints and requirements.

RELATED WORK

HADOOP SCHEDULERS

PERFORMANCES EVALUATION

Findings

CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 11	License type: CC BY 4.0

R Discovery Prime

LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A genetic algorithm-based job scheduling model for big data analytics.
Qinghua Lu ... Shanshan Li
EURASIP Journal on Wireless Communications and Networking | VOL. 2016
Qinghua Lu, et. al.Qinghua Lu ... Shanshan Li
27 Jun 2016
EURASIP Journal on Wireless Communications and Networking | VOL. 2016

Traffic-aware task placement with guaranteed job completion time for geo-distributed big data
Peng Li ... Toshiaki Miyazaki
-
Peng Li, et. al.Peng Li ... Toshiaki Miyazaki
01 May 2017
01 May 2017

On MapReduce Scheduling in Hadoop Yarn on Heterogeneous Clusters
Meng Wang ... Aiqin Hou
-
Meng Wang, et. al.Meng Wang ... Aiqin Hou
01 Aug 2018
01 Aug 2018

Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time
Peng Li ... Xiaofei Liao
IEEE Transactions on Parallel and Distributed Systems | VOL. 28
Peng Li, et. al.Peng Li ... Xiaofei Liao
01 Jun 2017
IEEE Transactions on Parallel and Distributed Systems | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

LSTPD: Least Slack Time-Based Preemptive Deadline Constraint Scheduler for Hadoop Clusters

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access