Abstract

Symmetric multiprocessor systems are increasingly common, not only as servers, but as a vehicle for executing a single application in parallel in order to reduce its execution latency. This paper presents PEDIGREE, a compilation tool that employs a new partitioning heuristic based on the program dependence graph (PDG). PEDIGREE creates overlapping inter-dependent threads, each executing on a subset of the SMP's processors that marches the thread's available parallelism. A unified framework is used to build threads from procedures, loop nests, loop iterations, and smaller constructs. PEDIGREE does not require any parallel language support; it is a post-compilation tool that reads in object code. The SDIO Signal and Data Processing Benchmark Suite has been selected as an example of real-time, latency-sensitive code. Its coarse-grained data flow parallelism is exploited by PEDIGREE to achieve speedups of 1.56x/2.11x (mean/max) and 1.61x/2.60x on two and four processors, respectively. There is roughly a 15% improvement over existing techniques that exploit only data parallelism. By exploiting the unidirectional flow of data for coarse-grained pipelining, the synchronization overhead is typically limited to less than 6% for synchronization latency of 100 cycles, and less than 2% for 10 cycles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call