Automatic parallelization of sequential programs combined with auto-tuning is an alternative to manual parallelization. With wider research directions and the increased number of performance tuning tools that have been developed, it has become increasingly difficult to choose a particular tuning tool. This paper reviews the fundamentals of different performance optimization and tuning techniques. It also surveys several tuning frameworks and classifies them into different groups based on their criteria. Developing benchmarks for HPC and validating their accuracy are demanding tasks for computer architects, researchers, and application developers. In addition to providing a survey of performance-tuning tools, we also performed a detailed review of current benchmarks and discussed the requirements for future benchmarks. We performed a detailed comparison of these tuning tools based on other features such as speedup and infrastructure details. We believe that this study will be a very useful resource for parallel computing communities, especially for early-stage parallel computing and performance researchers to gain exposure to existing performance optimization options.
Read full abstract