Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops

Jian Wang ,Guang R Gao

doi:10.1007/3-540-61053-7_49

Abstract

The objective of software pipelining is to generate code which can maximally exploit instruction-level parallelism (ILP) in modern multiissue processor architectures, such as VLIW and superscalar processors. Since the amount of ILP is usually fixed to a small number, four — eight, using state-of-the-art software pipelining scheduling techniques, modern compilers have been able to schedule instructions in a small window of successive iterations and keep the machine resources usefully busy. To maximally take advantage of software pipelining, it is beneficial if the number of iterations of the loops to be software pipelined is large (called trip counts in this paper). Therefore, software pipelining of nested loops becomes important, especially when the innermost loops have smaller trip counts.This paper presents a loop transformation which extends software pipelining from the innermost loops to the enclosing loop nests. Unlike some popular loop transformation techniques (e.g. unimodular transformation) targeted to multi-processor machines (where the goal has been to maximally expose loop-level parallelism i.e. the transformed loop nests have maximum number of doall loops), the goal of our transformation, pipelining-dovetailing, is to extend the software pipelining of the innermost loop to the surrounding loop nests. Thus all iterations of the loop nests can be smoothly software pipelined through, and the number of effective trip counts is maximized. We also define the condition under which pipelining-dovetailing is valid. As a result, a software pipelining framework is derived for loop nests which integrates software pipelining and pipelining-dovetailing together.KeywordsInstruction-Level ParallelismFine-Grain ParallelismSoftware PipeliningLoop SchedulingNested LoopVery Long Instruction Word(VLIW)Superscalar

Full Text