Abstract
Spatial data processing often requires massive datasets, and the task/data scheduling efficiency of these applications has an impact on the overall processing performance. Among the existing scheduling strategies, hypergraph-based algorithms capture the data sharing pattern in a global way and significantly reduce total communication volume. Due to heterogeneous processing platforms, however, single hypergraph partitioning for later scheduling may be not optimal. Moreover, these scheduling algorithms neglect the overlap between task execution and data transfer that could further decrease execution time. In order to address these problems, an extended hypergraph-based task-scheduling algorithm, named Hypergraph+, is proposed for massive spatial data processing. Hypergraph+ improves upon current hypergraph scheduling algorithms in two ways: (1) It takes platform heterogeneity into consideration offering a metric function to evaluate the partitioning quality in order to derive the best task/file schedule; and (2) It can maximize the overlap between communication and computation. The GridSim toolkit was used to evaluate Hypergraph+ in an IDW spatial interpolation application on heterogeneous master-slave platforms. Experiments illustrate that the proposed Hypergraph+ algorithm achieves on average a 43% smaller makespan than the original hypergraph scheduling algorithm but still preserves high scheduling efficiency.
Highlights
In recent years, with the rapid development of surveying and remote sensing technologies, the volume of spatial data has increased dramatically [1,2,3]
We propose an extended hypergraph-based task-scheduling algorithm, named Hypergraph+
Since the task execution time can be defined in terms of million instructions (MI), the CPU resource speed was modeled as million instructions per second (MIPS)
Summary
With the rapid development of surveying and remote sensing technologies, the volume of spatial data has increased dramatically [1,2,3]. Spatial data processing is a typical type of data-intensive applications where users must access and process massive spatial data. Each task requires a subset of input files from the storage nodes; a task may share a number of files with other tasks, while an individual task is submitted to one computing node for execution. The computing nodes themselves are connected to the storage nodes for data transfer through a network. This collaboration is orchestrated by a task/data scheduling strategy; scheduling strategy efficiency has an important influence on collaboration performance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.