Towards efficient fine-grain software pipelining

Guang R Gao,Yue-Bong Wong,Herbert H J Hum

doi:10.1145/77726.255177

Abstract

Dataflow software pipelining was proposed as a means of structuring fine-grain parallelism and has been studied mostly under an idealized dataflow architecture model with infinite resources[9]. In this paper, we investigate the effects of software pipelining under realistic architecture models with finite resources. Our target architecture is the McGill Dataflow Architecture which employs conventional pipelined techniques to achieve fast instruction execution, while exploiting fine-grain parallelism via a data-driven instruction scheduler. To achieve optimal execution efficiency, the compiled code must be able to make a balanced use of both the parallelism in the instruction execution unit and the fine-grain synchronization power of the machine.A detailed analysis based on simulation results is presented, focusing on two key architectural factors - the fine-grain synchronization capacity and the scheduling mechanism for enabling instructions. On one hand, our results provide experimental evidence that software pipelining is an effective method for exploiting fine-grain parallelism in loops. On the other, the experiments have also revealed the (somewhat pessimistic) fact that even a fully software pipelined code may not achieve good performance if the overhead for fine-grain synchronization exceeds the capacity of the machine.

Full Text