Abstract
Symmetric multiprocessor systems are increasingly common, not only as high-throughput servers, but as a vehicle for executing a single application in parallel in order to reduce its execution latency. This article presents Pedigree, a compilation tool that employs a new partitioning heuristic based on the program dependence graph (PDG). Pedigree creates overlapping, potentially interdependent threads, each executing on a subset of the SMP processors that matches the thread’s available parallelism. A unified framework is used to build threads from procedures, loop nests, loop iterations, and smaller constructs. Pedigree does not require any parallel language support; it is post-compilation tool that reads in object code. The SDIO Signal and Data Processing Benchmark Suite has been selected as an example of real-time, latency-sensitive code. Its coarse-grained data flow parallelism is naturally exploited by Pedigree to achieve speedups of 1.63×/2.13× (mean/max) and 1.71×/2.41× on two and four processors, respectively. There is roughly a 20% improvement over existing techniques that exploit only data parallelism. By exploiting the unidirectional flow of data for coarse-grained pipelining, the synchronization overhead is typically limited to less than 6% for synchronization latency of 100 cycles, and less than 2% for 10 cycles.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.