A hypergraph based task scheduling strategy for massive parallel spatial data processing on master-slave platforms

Bo Cheng Bo Cheng,Huayi Wu Huayi Wu,Xuefeng Guan Xuefeng Guan

doi:10.1109/geoinformatics.2015.7378674

Abstract

Data intensive applications are now ubiquitous in many fields, e.g. high-energy physics, astronomy, climate modeling, and the geosciences. Especially for spatial data processing in the distributed platforms, the parallel tasks are generally required to handle massive spatial datasets and usually treated as data-intensive applications. In order to address the challenge of massive spatial data processing, we propose a hypergraph based tasks scheduling strategy on a master-slave platform. Our task scheduling strategy involves two consecutive steps: mapping and scheduling. In the mapping process, we formulate a hypergraph partition model to decide which tasks will be executed by each slave processor. At the same time, the scheduling process determines the execution order of selected tasks and the order in which the master transfers the files to the slaves. An experiment based on the GridSim toolkit was conducted to evaluate and compare our scheduling algorithm with Min-min heuristic. Our experimental results show that our scheduling strategy outperforms the Min-min heuristic.

Full Text