Compiler Heuristics Research Articles

Recent compilers offer a vast number of multilayered optimizations targeting different code segments of an application. Choosing among these optimizations can significantly impact the performance of the code being optimized. The selection of the right set of compiler optimizations for a particular code segment is a very hard problem, but finding the best ordering of these optimizations adds further complexity. Finding the best ordering represents a long standing problem in compilation research, named the phase-ordering problem. The traditional approach of constructing compiler heuristics to solve this problem simply cannot cope with the enormous complexity of choosing the right ordering of optimizations for every code segment in an application. This article proposes an automatic optimization framework we call MiCOMP, which Mitigates the Compiler Phase-ordering problem. We perform phase ordering of the optimizations in LLVM’s highest optimization level using optimization sub-sequences and machine learning. The idea is to cluster the optimization passes of LLVM’s O3 setting into different clusters to predict the speedup of a complete sequence of all the optimization clusters instead of having to deal with the ordering of more than 60 different individual optimizations. The predictive model uses (1) dynamic features, (2) an encoded version of the compiler sequence, and (3) an exploration heuristic to tackle the problem. Experimental results using the LLVM compiler framework and the Cbench suite show the effectiveness of the proposed clustering and encoding techniques to application-based reordering of passes, while using a number of predictive models. We perform statistical analysis on the results and compare against (1) random iterative compilation, (2) standard optimization levels, and (3) two recent prediction approaches. We show that MiCOMP’s iterative compilation using its sub-sequences can reach an average performance speedup of 1.31 (up to 1.51). Additionally, we demonstrate that MiCOMP’s prediction model outperforms the -O1, -O2, and -O3 optimization levels within using just a few predictions and reduces the prediction error rate down to only 5%. Overall, it achieves 90% of the available speedup by exploring less than 0.001% of the optimization space.

Read full abstract

Directive-based programming of graphics processing units (GPUs) has recently appeared as a viable alternative to using specialized low-level languages such as CUDA C and OpenCL for general-purpose GPU programming. This technique, which uses “directive” or “pragma” statements to annotate source codes written in traditional high-level languages, is designed to permit a unified code base to serve multiple computational platforms. In this work we analyze the popular OpenACC programming standard, as implemented by the PGI compiler suite, in order to evaluate its utility and performance potential in computational fluid dynamics (CFD) applications. We examine the process of applying the OpenACC Fortran API to a test CFD code that serves as a proxy for a full-scale research code developed at Virginia Tech; this test code is used to asses the performance improvements attainable for our CFD algorithm on common GPU platforms, as well as to determine the modifications that must be made to the original source code in order to run efficiently on the GPU. Performance is measured on several recent GPU architectures from NVIDIA and AMD (using both double and single precision arithmetic) and the accelerator code is benchmarked against a multithreaded CPU version constructed from the same Fortran source code using OpenMP directives. A single NVIDIA Kepler GPU card is found to perform approximately 20× faster than a single CPU core and more than 2× faster than a 16-core Xeon server. An analysis of optimization techniques for OpenACC reveals cases in which manual intervention by the programmer can improve accelerator performance by up to 30% over the default compiler heuristics, although these optimizations are relevant only for specific platforms. Additionally, the use of multiple accelerators with OpenACC is investigated, including an experimental high-level interface for multi-GPU programming that automates scheduling tasks across multiple devices. While the overall performance of the OpenACC code is found to be satisfactory, we also observe some significant limitations and restrictions imposed by the OpenACC API regarding certain useful features of modern Fortran (2003/8); these are sufficient for us to conclude that it would not be practical to apply OpenACC to our full research code at this time due to the amount of refactoring required.

Read full abstract

Compiler Heuristics Research Articles

Related Topics

Articles published on Compiler Heuristics

Compilation Forking: A Fast and Flexible Way of Generating Data for Compiler-Internal Machine Learning Tasks

MiCOMP

Directive-based GPU programming for computational fluid dynamics

Continuous learning of compiler heuristics

Evolving Compiler Heuristics to Manage Communication and Contention

Meta optimization

Task Selection for the Multiscalar Architecture

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Compiler Heuristics Research Articles

Related Topics

Articles published on Compiler Heuristics

Compilation Forking: A Fast and Flexible Way of Generating Data for Compiler-Internal Machine Learning Tasks

MiCOMP

Directive-based GPU programming for computational fluid dynamics

Continuous learning of compiler heuristics

Evolving Compiler Heuristics to Manage Communication and Contention

Meta optimization

Task Selection for the Multiscalar Architecture