Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

An Unfolding-Based Loop Optimization Technique

  • TL;DR
  • Abstract
  • Literature Map
  • Similar Papers
TL;DR

This paper introduces an unfolding-based loop optimization technique that transforms "badly-structured" loops into well-structured ones by unfolding initial iterations, thereby exposing opportunities for traditional optimizations like loop invariant code motion, unrolling, and peeling, potentially enhancing compiler performance on complex loops.

Abstract
Translate article icon Translate Article Star icon

Abstract Loops in programs are the source of many optimizations for improving program performance, particularly on modern high-performance architectures as well as vector and multithreaded systems. Techniques such as loop invariant code motion, loop unrolling and loop peeling have demonstrated their utility in compiler optimizations. However, many of these techniques can only be used in very limited cases when the loops are ”well-structured” and easy to analyze. For instance, loop invariant code motion works only when invariant code is inside loops; loop unrolling and loop peeling work effectively when the array references are either constants or affine functions of index variable. It is our contention that there are many opportunities overlooked by limiting the optimizations to well structured loops. In many cases, even ”badly-structured” loops may be transformed into well structured loops. As a case in point, we show how some loop-dependent code can be transformed into loop-invariant code by transforming the loops. Our technique described in this paper relies on unfolding the loop for several initial iterations such that more opportunities may be exposed for many other existing compiler optimization techniques such as loop invariant code motion, loop peeling, loop unrolling, and so on.KeywordsAffine FunctionControl DependenceCompiler OptimizationInstruction Level ParallelismDependence EdgeThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Similar Papers
  • Research Article
  • Cite Count Icon 10
  • 10.1145/967278.967284
What can we gain by unfolding loops?
  • Feb 1, 2004
  • ACM SIGPLAN Notices
  • Litong Song + 1 more

Loops in programs are the source of many optimizations for improving program performance, particularly on modern high-performance architectures as well as vector and multithreaded systems. Techniques such as loop invariant code motion, loop unrolling and loop peeling have demonstrated their utility in compiler optimizations. However, many of these techniques can only be used in very limited cases when the loops are "well-structured" and easy to analyze. For instance, loop invariant code motion works only when invariant code is inside loops; loop unrolling and loop peeling work effectively when the array references are either constants or affine functions of index variable. It is our contention that there are many opportunities overlooked by limiting the optimizations to "well structured" loops. In many cases, even "badly-structured" loops may be transformed into "well structured" loops. As a case in point, we show how some loop-dependent code can be transformed into loop-independent code by transforming the loops. Our technique described in this paper relies on unfolding the loop for several initial iterations such that more opportunities may be exposed for many other existing compiler optimization techniques such as loop invariant code motion, loop peeling, loop unrolling and so on.

  • Research Article
  • 10.52783/cana.v32.2860
The Opticode: A User-Centric Tool for Enhancing Software Efficiency and Minimizing Errors Through Dead Code Elimination and Loop Invariant Code Motion Techniques
  • Dec 18, 2024
  • Communications on Applied Nonlinear Analysis
  • Tulshihar Patil

Introduction: This article introduces OptiCode, a complex software tool that uses loop invariant code mobility and dead code reduction, among other advanced code optimization techniques, to improve code efficiency and decrease compile time. Using Loop Invariant Code Motion (LICM) and Abstract Syntax Trees (ASTs) for precise code analysis, OptiCode efficiently detects and eliminates redundant code, as well as optimizes loop structures by removing 4.87% of dead code with an efficiency of 5.38. OptiCode outperforms other apps in comparison, as seen by the considerable compile time savings and excellent efficiency ratings that it achieves. Objectives: To remove the unused code and elements affecting the efficacy of the code. Methods: Source code is passed as an input then the lexer performs the tokenization. Tokenized words are processed by the parser to assess the syntax of the code. Customized Abstract Syntax Tree removes the dead code, and the data is passed to Customized Loop invariant code Motion which optimize the looping structure in the code. At last, the optimized code is generated. Results: OptiCode outperformed Taskapp (3.89), Agilla (3.67), and Rfmtoleds (3.45) with its greatest efficiency rating of 5.38 on a 10-point scale in our comparison research. The 731 lines of code in the OptiCode codebase include 150 lines of dead code and 57 variables that aren't used Conclusions: The code is optimized to save space and time. Performance increases as number of lines increases.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/11688839_11
Loop Transformations in the Ahead-of-Time Optimization of Java Bytecode
  • Jan 1, 2006
  • Simon Hammond + 1 more

Loop optimizations such as loop unrolling, unfolding and invariant code motion have long been used in a wide variety of compilers to improve the running time of applications. In this paper we present a series of experimental results detailing the effect these techniques have on the running time of Java applications following ahead of time optimization. We also detail the optimization tools and transformations developed for this paper which extend the SOOT framework discussed in a number of previous papers on the subject. Our experimentation, conducted on the SciMark 2.0 benchmarking suite, demonstrates that when optimized using the techniques mentioned, Java applications can benefit from performance improvements of up to 20%. We finish with a discussion of the results obtained, including results on how the optimizations affect JIT compilation and class size and proceed to argue that ahead-of-time loop unrolling and unfolding optimization may have a role to play in improving the performance of Java applications, particularly in scientific applications.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/11802839_55
High-Level Synthesis Using SPARK and Systolic Array
  • Jan 1, 2006
  • Jae-Jin Lee + 1 more

Recently, SPARK parallelizing high-level synthesis software tool has been developed. It takes a behavioral ANSI-C code as an input, schedules it using speculative code motions and loop transformations, generates a finite state machine for the scheduled design graph, and then finally outputs a synthesizable RTL VHDL code. To handle loop algorithm, SPARK employs various loop transformations such as loop invariant code motion, loop unrolling, loop index variable elimination and loop shifting. In loop synthesis, however, SPARK does not produce circuit description whose quality can compete with manual designs. With the objective of improving the quality of high-level synthesis results for designs with loops, this paper shows an upgrade of SPARK through transforming nested loops into a 2-D systolic array to increase parallelism. The C-to-VHDL loop synthesis in this paper achieves synthesis results that are better than those achieved from a current version of SPARK for matrix-matrix multiplication and FIR filter, and can be incorporated into SPARK parallelizing high-level synthesis framework.KeywordsNest LoopSystolic ArrayTotal Execution TimeHardware ComplexitySynthesis ResultThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

  • Conference Article
  • Cite Count Icon 7
  • 10.1145/1534530.1534550
The effect of unrolling and inlining for Python bytecode optimizations
  • May 4, 2009
  • Yosi Ben Asher + 1 more

In this study, we consider bytecode optimizations for Python, a programming language which combines object-oriented concepts with features of scripting languages, such as dynamic dictionaries. Due to its design nature, Python is relatively slow compared to other languages. It operates through compiling the code into powerful bytecode instructions that are executed by an interpreter. Python's speed is limited due to its interpreter design, and thus there is a significant need to optimize the language. In this paper, we discuss one possible approach and limitations in optimizing Python based on bytecode transformations. In the first stage of the proposed optimizer, the bytecode is expanded using function inline and loop unrolling. The second stage of transformations simplifies the bytecode by applying a complete set of data-flow optimizations, including constant propagation, algebraic simplifications, dead code elimination, copy propagation, common sub expressions elimination, loop invariant code motion and strength reduction. While these optimizations are known and their implementation mechanism (data flow analysis) is well developed, they have not been successfully implemented in Python due to its dynamic features which prevent their use. In this work we attempt to understand the dynamic features of Python and how these features affect and limit the implementation of these optimizations. In particular, we consider the significant effects of first unrolling and then inlining on the ability to apply the remaining optimizations. The results of our experiments indicate that these optimizations can indeed be implemented and dramatically improve execution times.

  • Research Article
  • Cite Count Icon 46
  • 10.1109/tce.2017.015072
DSP design protection in CE through algorithmic transformation based structural obfuscation
  • Nov 1, 2017
  • IEEE Transactions on Consumer Electronics
  • Anirban Sengupta + 3 more

Structural obfuscation offers a means to effectively secure through obfuscation the contents of an intellectual property (IP) cores used in an electronic system-on-chip (SoC). In this work a novel structural obfuscation methodology for protecting a digital signal processor (DSP) IP core at the architectural synthesis design stage. The proposed approach specifically targets protection of IP cores that involve complex loops. Five different algorithmic level transformation techniques are employed: loop unrolling, loop invariant code motion, tree height reduction/increment, logic transformation and redundant operation removal. Each of these can yield camouflaged functionally equivalent designs. In addition, low cost obfuscated design is generated through proposed approach through the use of multi-stage algorithmic transformation and particle swarm optimization (PSO)-drive design space exploration (DSE). Results of proposed approach yielded an enhancement obfuscation of 22 % and reduction in obfuscated design cost of 55 % compared to similar prior art.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/iccd.1988.25700
CTP-A family of optimizing compilers for the NS32532 microprocessor
  • Oct 3, 1988
  • C Bendelac + 1 more

Techniques for generating highly optimized code for a pipelined microprocessor, the NS32532, and its fast floating point slave processor, the NS32580, are described in the context of the CTP family of optimizing compilers. All CTP compilers are constructed from three separate parts: a language-dependent compiler front-end, a shared global optimizer, and a shared code generator. In addition to most classical transformations, such as value propagation, redundant and dead code elimination, loop invariant code motion, global strength reduction and register allocation, the CTP compilers also perform less common optimizations, such as loop unrolling, basic block reorganization, code reordering, and profile feedback utilization. The relative influence of the different optimizations on the performance of the NS32532 using several standard benchmark programs is presented. >

  • Book Chapter
  • 10.1201/9781003127598-1-1
Securing Dedicated DSP Co-processors (Hardware IP) using Structural Obfuscation for IoT-oriented Platforms
  • Jul 14, 2021
  • Anirban Sengupta + 1 more

Internet of Things (IoT) has become an integral part of modern life. IoT oriented platforms are comprised of digital signal processing (DSP) coprocessors suitable for low power high performance applications, compared to traditional counterparts such as microprocessors. However, DSP coprocessors are not entirely designed in-house due to the global design supply chain, resulting into security threats at the hardware level. Some of the prominent hardware security threats for such devices used in IoT oriented platforms could be backdoor Trojan insertion, reverse engineering, etc. This chapter discusses some of the standard structural obfuscation approaches used for securing dedicated DSP coprocessors, as well as the structural obfuscation approaches that make the DSP hardware unobvious (and uninterpretable) from an attacker’s perspective. More explicitly, state of the art structural obfuscation approaches such as compiler-driven transformation techniques, hybrid transformation techniques, hologram based obfuscation techniques and key-based structural obfuscation techniques are discussed. Adopting a distinct and integrated approach, it aims to elaborate on the transformation processes for structural obfuscation, such as logic transformation, tree height transformation, partitioning, loop unrolling, loop invariant code motion, folding knob, redundant operation elimination, and so on. Demonstrations use DSP applications such as finite impulse response filter, discrete cosine transformation and other digital filters. Also presented is comparative analysis of the structural obfuscation approaches used for DSP applications.

  • Research Article
  • Cite Count Icon 97
  • 10.1145/1027084.1027087
Coordinated parallelizing compiler optimizations and high-level synthesis
  • Oct 1, 2004
  • ACM Transactions on Design Automation of Electronic Systems
  • Sumit Gupta + 3 more

We present a high-level synthesis methodology that applies a coordinated set of coarse-grain and fine-grain parallelizing transformations. The transformations are applied both during a pre-synthesis phase and during scheduling, with the objective of optimizing the results of synthesis and reducing the impact of control flow constructs on the quality of results. We first apply a set of source level presynthesis transformations that include common sub-expression elimination (CSE), copy propagation, dead code elimination and loop-invariant code motion, along with more coarse-level code restructuring transformations such as loop unrolling. We then explore scheduling techniques that use a set of aggressive speculative code motions to maximally parallelize the design by re-ordering, speculating and sometimes even duplicating operations in the design. In particular, we present a new technique called "Dynamic CSE" that dynamically coordinates CSE and code motions such as speculation and conditional speculation during scheduling. We implemented our parallelizing high-level synthesis in the <i>SPARK</i> framework. This framework takes a behavioral description in ANSI-C as input and generates synthesizable register-transfer level VHDL. Our results from computationally expensive portions of three moderately complex design targets, namely, MPEG-1, MPEG-2 and the GIMP image processing tool, validate the utility of our approach to the behavioral synthesis of designs with complex control flows.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/1289881.1289912
Facilitating compiler optimizations through the dynamic mapping of alternate register structures
  • Sep 30, 2007
  • Chris Zimmer + 4 more

Aggressive compiler optimizations such as software pipelining and loop invariant code motion can significantly improve application performance, but these transformations often require the use of several additional registers to hold data values across one or more loop iterations. Compilers that target embedded systems may often have difficulty exploiting these optimizations since many embedded systems typically do not have as many general purpose registers available. Alternate register structures like register queues can be used to facilitate the application of these optimizations due to common reference patterns. In this paper, we propose a microarchitectural technique that permits these alternate register structures to be efficiently mapped into a given processor architecture and automatically exploited by an optimizing compiler. We show that this minimally invasive technique can be used to facilitate the application of software pipelining and loop invariant code motion for a variety of embedded benchmarks. This leads to performance improvements for the embedded processor, as well as new opportunities for further aggressive optimization of embedded systems software due to a significant decrease in the register pressure of tight loops.

  • Research Article
  • 10.1145/2490301.2451136
DeAliaser
  • Mar 16, 2013
  • ACM SIGARCH Computer Architecture News
  • Wonsun Ahn + 2 more

Alias analysis is a critical component in many compiler optimizations. A promising approach to reduce the complexity of alias analysis is to use speculation. The approach consists of performing optimizations assuming the alias relationships that are true most of the time, and repairing the code when such relationships are found not to hold through runtime checks. This paper proposes a general alias speculation scheme that leverages upcoming hardware support for transactions with the help of some ISA extensions. The ability of transactions to checkpoint and roll back frees the compiler to pursue aggressive optimizations without having to worry about recovery code. Also, exposing the memory conflict detection hardware in transactions to software allows runtime checking of aliases with little or no overhead. We test the potential of the novel alias speculation approach with Loop Invariant Code Motion (LICM), Global Value Numbering (GVN), and Partial Redundancy Elimination (PRE) optimization passes. On average, they are shown to reduce program execution time by 9% in SPEC FP2006 applications and 3% in SPEC INT2006 applications over the alias analysis of a state-of-the-art compiler.

  • Conference Article
  • Cite Count Icon 6
  • 10.1145/2451116.2451136
DeAliaser
  • Mar 16, 2013
  • Wonsun Ahn + 2 more

Alias analysis is a critical component in many compiler optimizations. A promising approach to reduce the complexity of alias analysis is to use speculation. The approach consists of performing optimizations assuming the alias relationships that are true most of the time, and repairing the code when such relationships are found not to hold through runtime checks.This paper proposes a general alias speculation scheme that leverages upcoming hardware support for transactions with the help of some ISA extensions. The ability of transactions to checkpoint and roll back frees the compiler to pursue aggressive optimizations without having to worry about recovery code. Also, exposing the memory conflict detection hardware in transactions to software allows runtime checking of aliases with little or no overhead. We test the potential of the novel alias speculation approach with Loop Invariant Code Motion (LICM), Global Value Numbering (GVN), and Partial Redundancy Elimination (PRE) optimization passes. On average, they are shown to reduce program execution time by 9% in SPEC FP2006 applications and 3% in SPEC INT2006 applications over the alias analysis of a state-of-the-art compiler.

  • Research Article
  • Cite Count Icon 1
  • 10.1145/2499368.2451136
DeAliaser
  • Mar 16, 2013
  • ACM SIGPLAN Notices
  • Wonsun Ahn + 2 more

Alias analysis is a critical component in many compiler optimizations. A promising approach to reduce the complexity of alias analysis is to use speculation. The approach consists of performing optimizations assuming the alias relationships that are true most of the time, and repairing the code when such relationships are found not to hold through runtime checks. This paper proposes a general alias speculation scheme that leverages upcoming hardware support for transactions with the help of some ISA extensions. The ability of transactions to checkpoint and roll back frees the compiler to pursue aggressive optimizations without having to worry about recovery code. Also, exposing the memory conflict detection hardware in transactions to software allows runtime checking of aliases with little or no overhead. We test the potential of the novel alias speculation approach with Loop Invariant Code Motion (LICM), Global Value Numbering (GVN), and Partial Redundancy Elimination (PRE) optimization passes. On average, they are shown to reduce program execution time by 9% in SPEC FP2006 applications and 3% in SPEC INT2006 applications over the alias analysis of a state-of-the-art compiler.

  • Conference Article
  • Cite Count Icon 51
  • 10.1109/cgo.2005.2
A Model-Based Framework: An Approach for Profit-Driven Optimization
  • Mar 20, 2005
  • Min Zhao + 2 more

Although optimizations have been applied for a number of years to improve the performance of software, problems that have been long-standing remain, which include knowing what optimizations to apply and how to apply them. To systematically tackle these problems, we need to understand the properties of optimizations. In our current research, we are investigating the profitability property, which is useful for determining the benefit of applying an optimization. Due to the high cost of applying optimizations and then experimentally evaluating their profitability, we use an analytic model framework for predicting the profitability of optimizations. In this paper, we target scalar optimizations, and in particular, describe framework instances for partial redundancy elimination (PRE) and loop invariant code motion (LICM). We implemented the framework for both optimizations and compare profit-driven PRE and LICM with a heuristic-driven approach. Our experiments demonstrate that a model-based approach is effective and efficient in that it can accurately predict the profitability of optimizations with low overhead. By predicting the profitability using models, we can selectively apply optimizations. The model-based approach does not require tuning of parameters used in heuristic approaches and works well across different code contexts and optimizations.

  • Conference Article
  • Cite Count Icon 19
  • 10.1109/sefm.2006.4
A PVS Based Framework for Validating Compiler Optimizations
  • Sep 11, 2006
  • A Kanade + 2 more

An optimization can be specified as sequential compositions of predefined transformation primitives. For each primitive, we can define soundness conditions which guarantee that the transformation is semantics preserving. An optimization of a program preserves semantics, if all applications of the primitives in the optimization satisfy their respective soundness conditions on the versions of the input program on which they are applied. This scheme does not directly check semantic equivalence of the input and the optimized programs and is therefore amenable to automation. Automating this scheme however requires a trusted framework for simulating transformation primitives and checking their soundness conditions. In this paper, we present the design of such a framework based on PVS. We have used it for specifying and validating several optimizations viz. common subexpression elimination, optimal code placement, lazy code motion, loop invariant code motion, full and partial dead code elimination, etc.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant