Improving Hadoop MapReduce performance on heterogeneous single board computer clusters

Sooyoung Lim,Dongchul Park

doi:10.1016/j.future.2024.06.025

Abstract

Over the past decade, Apache Hadoop has become a leading framework for big data processing. Single board computer (SBC) clusters, predominantly adopting Raspberry Pi (RPi), have been employed to explore the potential of MapReduce processing in terms of low power and cost because, capital costs aside, power consumption has also become a primary concern in many industries. After building SBC clusters, it is prevalent to consider adding more nodes, particularly newer generation SBCs, to the existing clusters or replacing old (or inactive) nodes with new ones to improve performance, inevitably causing heterogeneous SBC clusters. The Hadoop framework on these heterogeneous SBC clusters creates challenging new problems due to computing resource discrepancies in each node. Native Hadoop does not carefully consider the heterogeneity of the cluster nodes. Consequently, heterogeneous SBC Hadoop clusters result in significant performance variation or, more critically, persistent node failures. This paper proposes a new Hadoop Yet Another Resource Negotiator (YARN) architecture design to improve MapReduce performance on heterogeneous SBC Hadoop clusters with tight computing resources. We newly implement two main scheduling policies on Hadoop YARN based on the correct computing resource information that each SBC node provides: (1) two (master-driven vs. slave-driven) MapReduce task scheduling frameworks to determine more effective processing modes and (2) ApplicationMaster (AM) and reduce task distribution mechanisms to provide the best Hadoop performance by minimizing performance variation. Thus, the proposed Hadoop framework makes the best use of the performance-frugal SBC Hadoop cluster by intelligently distributing MapReduce tasks to each node. To our knowledge, the proposed framework is the first redesigned Hadoop YARN architecture to address various challenging problems particularly on heterogeneous SBC Hadoop clusters for big data processing. The extensive experiments with Hadoop benchmarks demonstrate that the redesigned framework performs better performance than the native Hadoop by an average of 2.55× and 1.55× under I/O intensive and CPU-intensive workloads, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving Hadoop MapReduce performance on heterogeneous single board computer clusters

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Journal: Future Generation Computer Systems	Publication Date: Jun 15, 2024
Citations: 1

Similar Papers

Performance analysis of single board computer clusters
Philip J Basford ... Simon J Cox
Future Generation Computer Systems | VOL. 102
Philip J Basford, et. al.Philip J Basford ... Simon J Cox
22 Jul 2019
Future Generation Computer Systems | VOL. 102

Evaluating single board computer clusters for cyber operations
Suzanne J Matthews ... Raymond W Blaine
-
Suzanne J Matthews, et. al.Suzanne J Matthews ... Raymond W Blaine
01 Oct 2016
01 Oct 2016

An Energy-Friendly Scheduler for Edge Computing Systems.
Alejandro Llorens-Carrodeguas ... Cristina Cervelló-Pastor
Sensors | VOL. 21
Alejandro Llorens-Carrodeguas, et. al.Alejandro Llorens-Carrodeguas ... Cristina Cervelló-Pastor
28 Oct 2021
Sensors | VOL. 21

SARN: A scalable resource managing framework for YARN
Zhonghao Lu ... Jingyu Wang
-
Zhonghao Lu, et. al.Zhonghao Lu ... Jingyu Wang
01 Aug 2016
01 Aug 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving Hadoop MapReduce performance on heterogeneous single board computer clusters

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems