A method for applying loop unrolling and software pipelining to instruction-level parallel architectures

Nobuhiro Kondo,Yoshiaki Fukazawa,Akira Koseki,Hideaki Komatsu

doi:10.1002/(sici)1520-684x(199808)29:9<62::aid-scj7>3.0.co;2-h

Nobuhiro Kondo, Yoshiaki Fukazawa + Show 2 more

https://doi.org/10.1002/(sici)1520-684x(199808)29:9<62::aid-scj7>3.0.co;2-h

Copy DOI

Abstract

A considerable part of program execution time is consumed by loops, so that loop optimization is highly effective especially for the innermost loops of a program. Software pipelining and loop unrolling are known methods for loop optimization. Software pipelining is advantageous in that the code becomes only slightly longer. This method, however, is difficult to apply if the loop includes branching when the parallelism is limited. On the other hand, loop unrolling, while being free of such limitations, suffers from a number of drawbacks. In particular the code size grows substantially and it is difficult to determine the optimal number of body replications. In order to solve these problems, it seems important to combine software pipelining with loop unrolling so as to utilize the advantages of both techniques while paying due regard to properties of programs under consideration and to the machine resources available. This paper describes a method for applying optimal loop unrolling and effective software pipelining to achieve this goal. Program characteristics obtained by means of an extended PDG (program dependence graph) are taken into consideration as well as machine resources. © 1998 Scripta Technica, Syst Comp Jpn, 29(9): 62–73, 1998

Full Text