Abstract

Symmetric multiprocessor systems are increasingly common, not only as high-throughput servers, but as a vehicle for executing a single application in parallel in order to reduce its execution latency. This article presents Pedigree, a compilation tool that employs a new partitioning heuristic based on the program dependence graph (PDG). Pedigree creates overlapping, potentially interdependent threads, each executing on a subset of the SMP processors that matches the thread’s available parallelism. A unified framework is used to build threads from procedures, loop nests, loop iterations, and smaller constructs. Pedigree does not require any parallel language support; it is post-compilation tool that reads in object code. The SDIO Signal and Data Processing Benchmark Suite has been selected as an example of real-time, latency-sensitive code. Its coarse-grained data flow parallelism is naturally exploited by Pedigree to achieve speedups of 1.63×/2.13× (mean/max) and 1.71×/2.41× on two and four processors, respectively. There is roughly a 20% improvement over existing techniques that exploit only data parallelism. By exploiting the unidirectional flow of data for coarse-grained pipelining, the synchronization overhead is typically limited to less than 6% for synchronization latency of 100 cycles, and less than 2% for 10 cycles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call