Abstract

This article describes a pipeline synthesis and optimization technique that increases data throughput of FPGA-based system using minimum pipeline resources. The technique is applied on CAL dataflow language, and designed based on relations, matrices, and graphs. First, the initial as-soon-as-possible (ASAP) and as-late-as-possible (ALAP) schedules, and the corresponding mobility of operators are generated. From this, operator coloring technique is used on conflict and nonconflict directed graphs using recursive functions and explicit stack mechanisms. For each feasible number of pipeline stages, a pipeline schedule with minimum total register width is taken as an optimal coloring, which is then automatically transformed to a description in CAL. The generated pipelined CAL descriptions are finally synthesized to hardware description languages for FPGA implementation. Experimental results of three video processing applications demonstrate up to 3.9× higher throughput for pipelined compared to non-pipelined implementations, and average total pipeline register width reduction of up to 39.6 and 49.9% between the optimal, and ASAP and ALAP pipeline schedules, respectively.

Highlights

  • Data throughput is one of the most important parameters in video processing systems

  • For algorithms that can be performed in parallel, such as the case with most digital signal processing (DSP) applications, parallel platforms such as multi-core CPU, many-core GPU, and FPGA generally results in higher throughput compared to traditional single-core systems

  • Each design starts with an initial single CAL actor description, automatically pipelined using our tools to obtain multiple-actor description, and synthesized to hardware description languages (HDL)

Read more

Summary

Introduction

Data throughput is one of the most important parameters in video processing systems. It is essentially a measure of how fast data passes from input to output of a system. For algorithms that can be performed in parallel, such as the case with most digital signal processing (DSP) applications, parallel platforms such as multi-core CPU, many-core GPU, and FPGA generally results in higher throughput compared to traditional single-core systems. Among these parallel platforms, FPGA systems allow the most parallel operations with the highest flexibility for programming parallel cores. As time-to-market window continues to shrink, a new high-level program that synthesizes to efficient parallel hardware is required to manage complexity and increase productivity

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call