Checking correctness of code generator architecture specifications

Niranjan Hasabnis ,Rui Qiao ,R Sekar

doi:10.5555/2738600.2738622

Abstract

Modern instruction sets are complex, and extensions are proposed to them frequently. This makes the task of modelling architecture specifications used by the code generators of modern compilers complex and error-prone. Given the important role played by the compilers, it is necessary that they are tested thoroughly, so that most of the bugs are detected early on. Unfortunately, modern compilers such as GCC do not target testing of individual components of a compiler, but instead perform end-to-end testing. In this paper, we target the problem of checking correctness of the architecture specifications used by code generators of modern compilers. Our solution leverages the architecture of modern compilers where a language-specific front-end compiles source-code into an intermediate representation (IR), which is then translated by the compiler's code generator into assembly code. Hence our approach is to test code generators by testing the equivalence of IR snippets and the corresponding assembly code generated. For this purpose, we have developed an efficient, architecture-neutral test case generation strategy. Using our prototype implementation, we performed correctness checking of 140 assembly instructions (80 general-purpose and 60 SSE out of around 600×86 instructions) of GCC's ×86 code generator, and found semantic differences in 39 of them, at least one of which has already been fixed by the GCC community in response to our report. We believe that our approach can be invaluable when developing support for a new architecture, as well as during frequent updates made to existing architectures such as ×86 for the purpose of supporting new instructions.

Full Text