Abstract

State-of-the-art Low Level Virtual Machine (LLVM) compiler infrastructure has a dedicated set of optimizations for loops. Each optimization is organized as a separate pass in LLVM, whereas passes are created using a mix of object creational patterns. However, recent focus of modern compilers is in improving runtime performance using a large set of conservative optimizations, most often omitting the energy consumption impact. This paper introduces the implementation of a new loop fusion algorithm designed for LLVM, which aims to improve both runtime performance and energy consumption of parallel codes involving loop parallelism. The algorithm proposed merges two non dependent loops with the same number of iterations and without any code between. Two loops are dependent when the first loop has to finish for the second to start, whereas two independent loops may not be allowed to be executed in parallel. This paper also proves that loop fusion is useful in optimizing loop parallelism, since the fusion of two loops cuts in half the number of threads that would otherwise be required to execute each iteration (when there is a one-to-one relation between threads and iterations). The decreased number of threads reduces the parallelization overhead which in turn improves the energy consumption. The improvements are discussed in the context of Non-Uniform Memory Access (NUMA) systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.