Dadu-SV: Accelerate Stereo Vision Processing on NPU

Feng Min,Meixuan Lu,Xingqi Zou,Haobo Xu,Junpei Huang,Ying Wang,Yujie Wang,Yinhe Han

doi:10.1109/les.2022.3162859

Abstract

Binocular vision and neural networks (CNNs) are widely seen in modern intelligent vision processing systems, such as robotics, autonomous vehicles, and AR gadgets. However, both the classic semiglobal matching (SGM) and deep CNNs entail substantial computing resource to reach the performance goal. Traditional embedded CPU/graphic processor unit (GPU) cannot simultaneously meet the processing speed and energy requirement, while the specialized circuits dedicated to SGM and CNN processing, respectively, will take considerable hardware and development costs. However, as the popularity of deep learning, neural processing units (NPUs) become prevalent in many embedded and edge devices, which possess high throughput computing power to deal with the matrix operations involved by neural networks. In this work, we attempt to take advantage of the neural processing architectures integrated in SoC chips to accelerate the SGM process, so that the hardware resources will be better utilized instead of investing more resources to create specialized SGM components. Thereby, this letter first deploys SGM on NPU by converting the incompatible operations into the neural-computing flow, and a configurable neural processing element is proposed to flexibly support various vector operation sequences. Then, a hybrid dataflow scheduler and the corresponding hardware modification are introduced to accelerate the cost processing, improving hardware utilization and on-chip memory footprint and access. Our solution runs at 45 fps for an image size of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$640\times 480$ </tex-math></inline-formula> , with 128 disparity levels. The speed-energy efficiency is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$52\times $ </tex-math></inline-formula> better than the GPU (Jetson TX1) solution with negligible additional hardware overhead and accuracy loss.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dadu-SV: Accelerate Stereo Vision Processing on NPU

Abstract

Talk to us

Similar Papers

More From: IEEE Embedded Systems Letters

Lead the way for us

Journal: IEEE Embedded Systems Letters	Publication Date: Dec 1, 2022
Citations: 3

Similar Papers

Custom Hardware Inference Accelerator for TensorFlow Lite for Microcontrollers
Erez Manor ... Shlomo Greenberg
IEEE Access | VOL. 10
Erez Manor, et. al.Erez Manor ... Shlomo Greenberg
01 Jan 2021
IEEE Access | VOL. 10

Demystifying TensorRT: Characterizing Neural Network Inference Engine on Nvidia Edge Devices
Omais Shafi ... Gayathri Ananthanarayanan
-
Omais Shafi, et. al.Omais Shafi ... Gayathri Ananthanarayanan
01 Nov 2021
01 Nov 2021

Prototyping of Low-Cost Configurable Sparse Neural Processing Unit with Buffer and Mixed-Precision Reshapeable MAC Array
Binyi Wu ... Bernd Waschneck
-
Binyi Wu, et. al.Binyi Wu ... Bernd Waschneck
01 Jan 2023
01 Jan 2023

Semi-Global Stereo Matching Algorithm Based on Multi-Scale Information Fusion
Changgen Deng ... Baojun Shi
Applied Sciences | VOL. 13
Changgen Deng, et. al.Changgen Deng ... Baojun Shi
12 Jan 2023
Applied Sciences | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dadu-SV: Accelerate Stereo Vision Processing on NPU

Abstract

Talk to us

Similar Papers

More From: IEEE Embedded Systems Letters