Abstract

This paper describes the design of an automatic parallelization framework. The kernel supplied at its front end was suggested as an instrument for parallel potential assessment. It was used to measure the maximum achievable speedups in the major set of the CHStone benchmark suite programs. In such framework, we suggested the liberation of parallelism incrementally. We proposed a data dependency heuristic-based transformation method to make true dependences dissociation. We generated an internal representation ($ IR^{2} $), where the Banerjee test conditions are met. Two among three of Banerjee test conditions came to be committed. In shared memory many/multicore platforms, the third condition could be satisfied by privatization. We would be able to choose the safe and the opportune pairwise (mapping-privatization) scheme among a number of threads mapping scenarios that become available in the $ IR^{2} $ structure. Instrumentation on a subset of CHStone benchmark was carried out as a validity proof of our proposal, and the results confirmed that our framework kernel is robust.

Highlights

  • Demands for parallelizing frameworks are expected to increase considerably in the near future for two reasons

  • Parallelization frameworks [9,10,11,12] are source-to-source compilers. They involve mainly a number of modules to cover the generation of an Internal Representation (IR) after parsing, modules for profiling, analysis engines, modules for transformation passes, and rolling-back parsing modules to generate output sources

  • The Map is a set of directed graphs that express the program outputs and the whole processing involved for their production

Read more

Summary

Introduction

Demands for parallelizing frameworks are expected to increase considerably in the near future for two reasons. Full automatic parallelization tools are supposed to be reliable instruments for parallel implementations Their popularity is not at the desired level. Data dependence profiling has not been addressed exclusively for automatic parallelization purposes It has been investigated in a wide context of compilation optimizations. It has been addressed to deal with a number of optimizations such as partial redundancy elimination (PRE), runtime code scheduling [14], and performance tuning [16] Integration of such profilers in autoparallelization tools seems to be challenging since some of them, the dynamic data-dependence profilers [15,16,17], suffer from runtime overhead and memory overhead. In the mainstream parallelization techniques when we suggest the pairwise mapping, Thread–iteration-loop, the problem of privatizing data arises. We instrument a subset of CHstone benchmark suite as a validity proof of our proposals and we expose the results as approximate estimations of the theoretical speedups

Framework description
Instrumentation in the CHStone benchmark
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call