Abstract

Data intensive applications are now ubiquitous in many fields, e.g. high-energy physics, astronomy, climate modeling, and the geosciences. Especially for spatial data processing in the distributed platforms, the parallel tasks are generally required to handle massive spatial datasets and usually treated as data-intensive applications. In order to address the challenge of massive spatial data processing, we propose a hypergraph based tasks scheduling strategy on a master-slave platform. Our task scheduling strategy involves two consecutive steps: mapping and scheduling. In the mapping process, we formulate a hypergraph partition model to decide which tasks will be executed by each slave processor. At the same time, the scheduling process determines the execution order of selected tasks and the order in which the master transfers the files to the slaves. An experiment based on the GridSim toolkit was conducted to evaluate and compare our scheduling algorithm with Min-min heuristic. Our experimental results show that our scheduling strategy outperforms the Min-min heuristic.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call