Adaptive memory reservation strategy for heavy workloads in the Spark environment

Bohan Li,Xin He,Junyang Yu,Guanghui Wang,Yixin Song,Shunjie Pan,Hangyu Gu

doi:10.7717/peerj-cs.2460

Abstract

The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive memory reservation strategy for heavy workloads in the Spark environment

Abstract

Talk to us

Similar Papers

More From: PeerJ Computer Science

Lead the way for us

Journal: PeerJ Computer Science	Publication Date: Nov 13, 2024
License type: CC BY 4.0

Similar Papers

Survey on high performance analytics of bigdata with apache spark
Ramkrushna C Maheshwar ... D Haritha
-
Ramkrushna C Maheshwar, et. al.Ramkrushna C Maheshwar ... D Haritha
01 May 2016
01 May 2016

Detection outliers on internet of things using big data technology
Haitham Ghallab ... Mona Nasr
Egyptian Informatics Journal | VOL. 21
Haitham Ghallab, et. al.Haitham Ghallab ... Mona Nasr
26 Dec 2019
Egyptian Informatics Journal | VOL. 21

Clustering of Zika virus epidemic using Gaussian mixture model in spark environment.
Lavanya K ... Prakhar Jain
Biomedical Research | VOL. 30
Lavanya K, et. al.Lavanya K ... Prakhar Jain
01 Jan 2019
Biomedical Research | VOL. 30

A Model-Driven Parallel Processing System for IoT Data Based on User-Defined Functions
Jianqiao Luo ... Li Zhang
-
Jianqiao Luo, et. al.Jianqiao Luo ... Li Zhang
01 Apr 2020
01 Apr 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive memory reservation strategy for heavy workloads in the Spark environment

Abstract

Talk to us

Similar Papers

More From: PeerJ Computer Science