Abstract

FPGA-based soft-processors have traditionally focused on fixed-pipeline designs. These designs have limited Instruction Level Parallelism (ILP) and constrain the integration of tightly-coupled accelerators, potentially limiting the speedup they can provide. Recently, it has been proposed that replacing the fixed-pipeline datapath in these soft processors with variable-latency parallel-execution functional units could facilitate the integration of custom instructions. In this paper, we discuss and analyze the architectural impact and requirements for decoupling the pipeline stages and supporting parallel execution units. We find that, relative to a fixed pipeline architecture, our variable-latency, parallel-execution architecture: increases resource usage by 8% LUTs and 9% FlipFlops but results in up to a 42% increase in Instruction Per Cycle (IPC), with an overall improvement of 28% MIPS/LUT. Finally, we analyze the performance tradeoffs of tightly integrating custom instructions into a fixed pipeline versus parallel execution units architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call