Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

J Rathinaraja,V S Ananthanarayana,Anand Paul

doi:10.1007/s11227-019-02960-0

Abstract

“More data, more information.” Big data helps businesses and research communities to gain insights and increase productivity. Many public cloud service providers offer Hadoop MapReduce as a service based on pay-per-use via infrastructure as a service on clusters of virtual machines promising on-demand horizontal scaling. These clusters of virtual machines are launched in various physical machines across racks in cloud data centers. Such multi-tenancy negatively introduces performance heterogeneity for Hadoop virtual machines due to hardware heterogeneity and interference from co-located virtual machine. Performance heterogeneity largely affects MapReduce job latency and resource utilization of rented Hadoop virtual clusters. Default MapReduce schedulers assign map/reduce tasks assuming the hardware is homogeneous. Interference-aware schedulers perform by only observing the interference pattern generated by co-located virtual machines. These schedulers do not consider the heterogeneous performance of virtual machines. Therefore, we propose a dynamic ranking-based MapReduce job scheduler that places the map and reduces tasks based on a virtual machine’s performance rank to minimize job latency and improve resource utilization. Our proposed approach calculates the performance score for each virtual machine based on hardware heterogeneity and co-located virtual machine interference. Then, it ranks the virtual machines based on the map and reduce performance separately to place map and reduce tasks. To demonstrate our ideas, we have set a test bed with 29 virtual machines on eight physical machines with different configurations and capacities. We modify a default fair scheduler in Hadoop 2.x to incorporate our ideas and evaluate them with different workloads on the PUMA dataset. The proposed method is then compared against a default fair scheduler (resource-aware) and an interference-aware scheduler based on job latency and resource utilization. Finally, we argue in favor of our approach as it improves resource utilization by 30–65% and overall job latency by up to 30%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Journal: The Journal of Supercomputing	Publication Date: Aug 1, 2019
Citations: 10

Similar Papers

Clustering based EO with MRF technique for effective load balancing in cloud computing
Hanuman Reddy N ... Uma Maheswari V
International Journal of Pervasive Computing and Communications | VOL. 20
Hanuman Reddy N, et. al.Hanuman Reddy N ... Uma Maheswari V
22 May 2023
International Journal of Pervasive Computing and Communications | VOL. 20

Multi-core Aware Virtual Machine Placement for Cloud Data Centers with Constraint Programming
Nagadevi ... Kasmir Raja
-
Nagadevi, et. al. Nagadevi ... Kasmir Raja
13 Jul 2021
13 Jul 2021

Dynamic Performance Aware Reduce Task Scheduling in MapReduce on Virtualized Environment
Rathinaraja Jeyaraj ... V S Ananthanarayana
-
Rathinaraja Jeyaraj, et. al.Rathinaraja Jeyaraj ... V S Ananthanarayana
01 Jun 2018
01 Jun 2018

Multi-level per node combiner (MLPNC) to minimize mapreduce job latency on virtualized environment
Rathinaraja Jeyaraj ... Ananthanarayana V S
-
Rathinaraja Jeyaraj, et. al.Rathinaraja Jeyaraj ... Ananthanarayana V S
09 Apr 2018
09 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing