High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?

Affaq Qamar,Luciano Lavagno,Fahad Bin Muslim,Mihai Teodor Lazarescu,Francesco Gregoretti

doi:10.1109/access.2016.2635378

Abstract

High-level synthesis (HLS)-based design methodologies are extremely viable for industries that are sensitive to production costs. In order to have competitive advantage, the ability to have several different implementations of the same algorithm satisfying a diverse range of resolution, cost, and performance constraints is highly desirable. In this paper, we present multiple hardware implementations of the semi-global matching (SGM) algorithm, which is used in stereo vision systems, e.g., for automotive applications. The hardware platform considered in this paper is a Xilinx Zynq system-on-chip. A performance comparison of both HLS-based design and a manual register transfer level (RTL) design in terms of quality of results, flexibility, and design time is also presented. SGM mainly includes a sequence of three processing steps, i.e., the “cost cube calculation” followed by the “path cost computation” and finally the “disparity approximation and minimization”. The path cost processor further performs a pixel-wise processing of the cost cube data along eight distinct path orientations. The baseline algorithmic model usually called the “golden” model utilizes considerably large arrays that are required to be mapped to an external DRAM and brought into the on-chip RAM when required. This necessitates adding both the memory transfer loops as well as insertion of calls to the AXI transactors for accessing the DRAM through the on-chip DDR slave. Furthermore, the initial algorithm (typically single-threaded) must be parallelized to fully exploit the concurrency offered by the target hardware platform. The design space exploration was thus performed by making several considerably different micro-architectural choices. Eventually, we were able to obtain an implementation comparable with the manual RTL design. Both the manual RTL and the HLS designs achieved the target real-time performance of 30 frames/s for the image resolution of $640\times 480$ with a disparity depth of 128 pixels per frame.

Highlights

S YSTEM-ON-CHIP (SoC) designs are becoming increasingly heterogeneous as they combine multicore architectures with a variety of hardware accelerators to carry out dedicated computational tasks
Design space exploration with High-level Synthesis (HLS) is much broader and easier than what is possible with logic synthesis alone, since the former can be achieved by changing HLS tool directives, while the latter usually requires one to manually change a detailed hardware description expressed in the form of Verilog or VHDL code
This article addresses some of the challenges posed by high-level synthesis tools while trying to improve the quality of results (QoR) for hardware implementations from a system-level behavioral model described at a high abstraction level

Summary

Introduction

S YSTEM-ON-CHIP (SoC) designs are becoming increasingly heterogeneous as they combine multicore architectures with a variety of hardware accelerators to carry out dedicated computational tasks. These hardware accelerators offer several orders of magnitude higher power and timing efficiency than a corresponding software implementation [1]. The presence of accelerators aggravates the complexity of SoC design. With the continuous advancements in technology, the complexity of electronic designs has a profound effect on the overall cost, performance, and power consumption of the modern electronic systems. T. Lazarescu are with the Department of Electronics and Telecommunications, Politecnico di Torino, Italy

Objectives

Methods

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2017
Citations: 12	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Architectural Power Estimation Based on Behavior Level Profiling
Srinivas Katkoori ... Ranga Vemuri
VLSI Design | VOL. 7
Srinivas Katkoori, et. al.Srinivas Katkoori ... Ranga Vemuri
01 Jan 1998
VLSI Design | VOL. 7

Automatic generation of high-coverage tests for RTL designs using software techniques and tools
Yu Zhang ... Wenlong Feng
-
Yu Zhang, et. al.Yu Zhang ... Wenlong Feng
01 Jun 2016
01 Jun 2016

Formal Verification of Pipelined Synthesized Designs by Exploiting Intermediary Rtls
Y Kim ... N Mansouri
International Journal of Modelling and Simulation | VOL. 25
Y Kim, et. al.Y Kim ... N Mansouri
01 Jan 2004
International Journal of Modelling and Simulation | VOL. 25

An Interactive Design Environment for C-Based High-Level Synthesis of RTL Processors
Dongwan Shin ... Andreas Gerstlauer
IEEE Transactions on Very Large Scale Integration Systems | VOL. 16
Dongwan Shin, et. al.Dongwan Shin ... Andreas Gerstlauer
01 Apr 2008
IEEE Transactions on Very Large Scale Integration Systems | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions