Abstract

Loop unrolling can provide more instruction-level parallelism opportunities for code and enables a greater range of instruction pipeline scheduling. In high-performance very-long-instruction-word (VLIW) digital signal processors (DSPs), there are special registers to address. To further improve the instruction-level parallelism of code for such DSPs by making full use of these registers, in this paper, we propose a more effective loop unrolling approach through extending memory accessing (LUAEMA). In this approach, the final unrolling factor is computed by a model in which every register kind and every memory accessing operation are considered. For basic digital signal processing algorithms, the unrolling factor under the LUAEMA is larger than that under the conventional loop unrolling approach. We also provide the opportunity to reduce the number of instructions in a loop during the code transformation of loop unrolling. The experimental results show that the loop unrolling approach proposed in this paper can achieve an average speedup ratio ranging from 1.14 to 1.81 compared with the conventional loop unrolling approach. For some algorithms, the peak speedup ratio is up to 2.11.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call