Abstract

Field-programmable gate array (FPGA) devices boast abundant resources with which custom accelerator components for signal, image, and data processing may be realized; however, realizing high-performance, low-cost accelerators currently demands manual register transfer level design. Software-programmable soft processors have been proposed as a way to reduce this design burden, but they are unable to support performance and cost comparable to custom circuits. This paper proposes a new soft processing approach for FPGA that promises to overcome this barrier. A high-performance, fine-grained streaming processor, known as a streaming accelerator element, is proposed, which realizes accelerators as large-scale custom multicore networks. By adopting a streaming execution approach with advanced program control and memory addressing capabilities, typical program inefficiencies can be almost completely eliminated to enable performance and cost, which are unprecedented among software-programmable solutions. When used to realize accelerators for fast Fourier transform, motion estimation, matrix multiplication, and sobel edge detection, it is shown how the proposed architecture enables real-time performance and with performance and cost comparable with hand-crafted custom circuit accelerators and up to two orders of magnitude beyond existing soft processors.

Highlights

  • R ECENT years have seen rapid evolution in fieldprogrammable gate array (FPGA) technologies, both in the scale of traditional devices and their extension to systemon-chip FPGA, incorporating heterogeneous multicore processor architectures

  • By ensuring absolute lowest cost FPGA PE (FPE) structure, the economies of scale produce dramatic reductions in multicore resource cost. This extreme focus comes at the cost of flexibility: once synthesized, the FPE does not exhibit the same degree of flexibility as a general soft processor because the architecture is highly constrained at design time to support the desired operation with highest performance and lowest cost; while it may be reprogrammed after synthesis, it cannot enable general-purpose operation in the manner of a standard softcore

  • Soft processors for FPGA suffer from substantial cost and performance penalties relative to custom circuits handcrafted at register transfer level

Read more

Summary

INTRODUCTION

R ECENT years have seen rapid evolution in fieldprogrammable gate array (FPGA) technologies, both in the scale of traditional devices and their extension to systemon-chip FPGA, incorporating heterogeneous multicore processor architectures. These mark FPGA as ideal hosts to high-performance custom computing architectures for signal, image, and data processing [4]. A novel streaming accelerator element (SAE) is presented which enables software-based accelerator development, while maintaining the performance and cost of custom circuits. To the best of our knowledge, the SAE is the highest performance, lowest cost software-programmable component on record for FPGA and the first to enable signal and image processing accelerators with performance and cost comparable with custom circuits. WANG AND MCALLISTER: STREAMING ELEMENTS FOR FPGA SIGNAL AND IMAGE PROCESSING ACCELERATORS

BACKGROUND
FPGA PROCESSING ELEMENT ACCELERATORS
The FPGA Processing Element
Load-Store PEs
Stream Accelerator Elements
Instruction Coding
Configuration
SAE-Based FFT
STREAMING BLOCK PROCESSING
Zero-Overhead Loop Execution
Block Data Memory Access
Off-SAE Communications
EXPERIMENTS
IP Comparison Case Study
Soft Processor Comparison
Findings
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call