Abstract

Spatial data processing often requires massive datasets, and the task/data scheduling efficiency of these applications has an impact on the overall processing performance. Among the existing scheduling strategies, hypergraph-based algorithms capture the data sharing pattern in a global way and significantly reduce total communication volume. Due to heterogeneous processing platforms, however, single hypergraph partitioning for later scheduling may be not optimal. Moreover, these scheduling algorithms neglect the overlap between task execution and data transfer that could further decrease execution time. In order to address these problems, an extended hypergraph-based task-scheduling algorithm, named Hypergraph+, is proposed for massive spatial data processing. Hypergraph+ improves upon current hypergraph scheduling algorithms in two ways: (1) It takes platform heterogeneity into consideration offering a metric function to evaluate the partitioning quality in order to derive the best task/file schedule; and (2) It can maximize the overlap between communication and computation. The GridSim toolkit was used to evaluate Hypergraph+ in an IDW spatial interpolation application on heterogeneous master-slave platforms. Experiments illustrate that the proposed Hypergraph+ algorithm achieves on average a 43% smaller makespan than the original hypergraph scheduling algorithm but still preserves high scheduling efficiency.

Highlights

  • In recent years, with the rapid development of surveying and remote sensing technologies, the volume of spatial data has increased dramatically [1,2,3]

  • We propose an extended hypergraph-based task-scheduling algorithm, named Hypergraph+

  • Since the task execution time can be defined in terms of million instructions (MI), the CPU resource speed was modeled as million instructions per second (MIPS)

Read more

Summary

Introduction

With the rapid development of surveying and remote sensing technologies, the volume of spatial data has increased dramatically [1,2,3]. Spatial data processing is a typical type of data-intensive applications where users must access and process massive spatial data. Each task requires a subset of input files from the storage nodes; a task may share a number of files with other tasks, while an individual task is submitted to one computing node for execution. The computing nodes themselves are connected to the storage nodes for data transfer through a network. This collaboration is orchestrated by a task/data scheduling strategy; scheduling strategy efficiency has an important influence on collaboration performance

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.