Abstract

As most of the recent high-resolution depth-estimation algorithms are computationally so expensive that they cannot work in real time, the common solution is using a low-resolution input image to reduce the computational complexity. We propose a different approach, an efficient and real-time convolutional neural network-based depth-estimation algorithm using a single high-resolution image as the input. The proposed method efficiently constructs a high-resolution depth map using a small encoding architecture and eliminates the need for a decoder, which is typically used in the encoder–decoder architectures employed for depth estimation. The proposed algorithm adopts a modified MobileNetV2 architecture, which is a lightweight architecture, to estimate the depth information through the depth-to-space image construction, which is generally employed in image super-resolution. As a result, it realizes fast frame processing and can predict a high-accuracy depth in real time. We train and test our method on the challenging KITTI, Cityscapes, and NYUV2 depth datasets. The proposed method achieves low relative absolute error (0.028 for KITTI, 0.167 for CITYSCAPES, and 0.069 for NYUV2) while working at speed reaching 48 frames per second on a GPU and 20 frames per second on a CPU for high-resolution test images. We compare our method with the state-of-the-art methods on depth estimation, showing that our method outperforms those methods. However, the architecture is less complex and works in real time.

Highlights

  • Received: 2 December 2021In computer vision, depth estimation is one of the key tasks employed in numerous applications such as 3D scene construction and understanding, medical 3D imaging and scanning, background/foreground separation, depth perception in self-driving cars and robots, and 3D graphics

  • The latest research on depth estimation has demonstrated the effectiveness of using convolutional neural networks (CNNs)-based algorithms for depth estimation with high accuracy; most recent studies [1–16] have not taken into consideration the processing speed and the optimization of such models to be employed in embedded systems or low-resource devices with limited memory and processing ability

  • We focus on monocular depth estimation (MDE), in particular, which involves depth prediction using a single RGB image, instead of stereo depth estimation (SDE)

Read more

Summary

Introduction

Received: 2 December 2021In computer vision, depth estimation is one of the key tasks employed in numerous applications such as 3D scene construction and understanding, medical 3D imaging and scanning, background/foreground separation, depth perception in self-driving cars and robots, and 3D graphics. The need for high-speed computer vision has increased due to the requirement for fast processing in embedded devices and smart phones, including self-driving cars and real-time 3D reconstruction. Such high-speed processing requires lightweight and memory-efficient computer-vision algorithms based on modern convolutional neural networks (CNNs). The latest research on depth estimation has demonstrated the effectiveness of using CNN-based algorithms for depth estimation with high accuracy; most recent studies [1–16] have not taken into consideration the processing speed and the optimization of such models to be employed in embedded systems or low-resource devices with limited memory and processing ability. The latest depth-estimation CNN models generally depend on encoder–decoder architecture for Accepted: 14 February 2022

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call