Optimizing data locality by executor allocation in spark computing environment

Zhongming Fu,Mengsi He,Yang Zhang,Zhuo Tang

doi:10.2298/csis220131065f

Abstract

Data locality is an important concept in big data processing. Most of the existing research optimized data locality from the aspect of task scheduling. However, as the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation for reduce stage in Spark computing environment. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, when the network distance between executors satisfies the triangular inequality, an approximate algorithm is proposed; and when the network distance between executors does not satisfy the triangular inequality, a greedy algorithm is proposed. Finally, we evaluate the performance of our algorithms in a practical Spark cluster by using several representative micro-benchmarks (Sort and Join) and macro-benchmarks (PageRank and LDA). Experimental results show that the proposed algorithms can decrease the execution time of tasks for lower data communication.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer Science and Information Systems	Publication Date: Jan 1, 2023
Citations: 2	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Optimizing data locality by executor allocation in spark computing environment

Abstract

Talk to us

Similar Papers

More From: Computer Science and Information Systems

Lead the way for us

Similar Papers

Optimal Risk Allocation in a Market with Non-Convex Preferences
Hirbod Assa
SSRN Electronic Journal | VOL. -
Hirbod AssaHirbod Assa
15 Mar 2015
SSRN Electronic Journal | VOL. -

Optimizing Data Locality by Executor Allocation in Reduce Stage for Spark Framework
Zhongming Fu ... Zhuo Tang
-
Zhongming Fu, et. al.Zhongming Fu ... Zhuo Tang
01 Jan 2021
01 Jan 2021

An optimal worker allocation problem for a U-shaped production line
Koichi Nakade ... Katsuhisa Ohno
International Journal of Production Economics | VOL. 60
Koichi Nakade, et. al.Koichi Nakade ... Katsuhisa Ohno
01 Apr 1999
International Journal of Production Economics | VOL. 60

Joint sensing task and subband allocation for large-scale spectrum profiling
Dong-Hoon Shin ... Shibo He
-
Dong-Hoon Shin, et. al.Dong-Hoon Shin ... Shibo He
01 Apr 2015
01 Apr 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing data locality by executor allocation in spark computing environment

Abstract

Talk to us

Similar Papers

More From: Computer Science and Information Systems