Low-Level Virtual Machine (LLVM) compiler infrastructure is a useful tool for building just-in-time (JIT) compilers, besides its reliable front end represented by a clang compiler and its elaborated middle end containing different optimizations that improve the runtime performance. This paper specifically addresses the part of building a JIT compiler using an LLVM with the scope of obtaining the hardware architecture details of the underlying machine such as the number of cores and the number of logical cores per processing unit and providing them to the NUMA-BTLP static thread classification algorithm and to the NUMA-BTDM static thread mapping algorithm. Afterwards, the hardware-aware algorithms are run using the JIT compiler within an optimization pass. The JIT compiler in this paper is designed to run on a parallel C/C++ application (which creates threads using Pthreads), before the first time the application is executed on a machine. To achieve this, the JIT compiler takes the native code of the application, obtains the corresponding LLVM IR (Intermediate Representation) for the native code and executes the hardware-aware thread classification and the thread mapping algorithms on the IR. The NUMA-Balanced Task and Loop Parallelism (NUMA-BTLP) and NUMA-Balanced Thread and Data Mapping (NUMA-BTDM) are expected to optimize the energy consumption by up to 15% on the NUMA systems.
Read full abstract