Instruction Fusion for Multiscalar and Many-Core Processors

Yaojie Lu,Sotirios G Ziavras

doi:10.1007/s10766-015-0386-1

Abstract

The utilization wall, caused by the breakdown of threshold voltage scaling, hinders performance gains for new generation microprocessors. We propose an instruction fusion technique for multiscalar and many-core processors to alleviate its impact. With instruction fusion, similar copies of an instruction to be run on multiple pipelines or cores are merged into a single copy for simultaneous execution. Instruction fusion applied to vector code enables the processor to idle early pipeline stages and instruction caches at various times during program implementation with minimum performance degradation, while reducing program size and the required instruction memory bandwidth. Instruction fusion is applied here to a MIPS-based dual-core that resembles an ideal multiscalar of degree two. Benchmarking using an FPGA prototype shows a 6---11 % reduction in the dynamic power dissipation for the targeted applications as well as a 17---45 % decrease in code size with frequent performance improvements due to higher instruction cache hit rates.

Full Text