Efficient Graph Algorithms for Mapping Tasks to Processors

Sesha Kalyur,G S Nagaraja

doi:10.1007/978-3-030-47560-4_35

Abstract

Parallelization is the process of finding instruction clusters called tasks, which can be executed in concurrent fashion, on multiple processing elements. The objective of parallel execution is to reduce program run time, by overlapping the processing of these tasks. However these tasks are not entirely independent, and generally depend on data generated by peers. One of the objectives of effective parallelization is to limit this inter-task communication, an overhead which can limit the speedup realized through parallelization. Since this inter-task communication cannot be entirely avoided, judiciously mapping the tasks among the processors of a parallel machine, assumes utmost importance. Performance of a parallel program, is also determined by how uniformly the processors of the target machine are loaded with tasks, in situations where the tasks outnumber the processors. This activity is normally referred to as the load balancing. The problem of generating parallel tasks out of a sequential program is well studied, but the related problem of effectively mapping the concurrent tasks, to the processors of a parallel machine, still needs to be researched. This research work involved studying the problem of mapping specific tasks to individual processors, referred to here as the Processor Task Mapping and finding effective solutions. Previous solutions to the mapping problem were non-deterministic, sub-optimal, and incomplete, since their focus was mainly on the task schedules and load balancing criteria and mostly ignoring the communication profile of the tasks to drive the mapping decision. This was the main driving force for taking up our research. We present here the outcomes of this study namely, several efficient graph based algorithms to perform this mapping effectively. These algorithms are general and are applicable to machine topologies of diverse architectures, including the Shared Memory Multiprocessor, Distributed Multiprocessor and the Non-Uniform Memory Access Machines (NUMA), are scalable, deterministic, and produce close to optimal solutions.

Full Text