Abstract

VLIW processors typically deliver high performance on limited budget making them ideal for a variety of communication and signal processing solutions. These processors typically need large multi-ported register files that can have side effects of increased cycle time and high power consumption. The access delay and energy of these register files can also become prohibitive when increasing the register count or the access ports, thus limiting the overall performance of the processor. Most prior art circumvent this problem by using multiple clusters with private register files, to lower the access delay and reduce energy consumption. However, clustering artifacts, like increased inter--cluster communication operations and spill-recovery code, result in a performance penalty. This paper proposes CURE — a novel technique to considerably reduce the negative effects of clustering. CURE augments the ISA to expose the communication registers to the compilers to increase availability of architectural register state to all functional units. The inter--cluster communication operations are integrated into regular ALU and memory operations to improve instruction encoding efficiency. We also propose a new code scheduling heuristic to handle the ISA changes, and to realize the improvements in processor’s performance and energy consumption. Our quantitative analysis estimates that CURE, when compared to the baseline 8--issue uni--cluster processor, boosts average performance by 61% while reducing the average register dynamic energy by 77%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call