Abstract

High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of $640\times 480$ with a disparity depth of 128 pixels per frame.

Highlights

  • S YSTEM-ON-CHIP (SoC) designs are becoming increasingly heterogeneous as they combine multicore architectures with a variety of hardware accelerators to carry out dedicated computational tasks

  • Design space exploration with High-level Synthesis (HLS) is much broader and easier than what is possible with logic synthesis alone, since the former can be achieved by changing HLS tool directives, while the latter usually requires one to manually change a detailed hardware description expressed in the form of Verilog or VHDL code

  • This article addresses some of the challenges posed by high-level synthesis tools while trying to improve the quality of results (QoR) for hardware implementations from a system-level behavioral model described at a high abstraction level

Read more

Summary

Introduction

S YSTEM-ON-CHIP (SoC) designs are becoming increasingly heterogeneous as they combine multicore architectures with a variety of hardware accelerators to carry out dedicated computational tasks. These hardware accelerators offer several orders of magnitude higher power and timing efficiency than a corresponding software implementation [1]. The presence of accelerators aggravates the complexity of SoC design. With the continuous advancements in technology, the complexity of electronic designs has a profound effect on the overall cost, performance, and power consumption of the modern electronic systems. T. Lazarescu are with the Department of Electronics and Telecommunications, Politecnico di Torino, Italy

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call