Fast Depth Estimation in a Single Image Using Lightweight Efficient Neural Network.

Sangwon Kim,Jaeyeal Nam,Byoungchul Ko

doi:10.3390/s19204434

Abstract

Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.

Highlights

Depth estimation from objects or scenes has been studied for a long time in the computer vision field and has been applied in various applications, such as 3D modeling, computer graphics, virtual reality, augmented reality, and autonomous driving
Unlike conventional methods, which aim at accurately predicting the 3D depth of a near object and require physical equipment such as an infrared sensor or multi-view camera, the present study proposes the use of an algorithm and an optimized lightweight neural network for quickly generating a depth map using a single image taken in an unstructured indoor or outdoor environment, including under various illumination conditions and camera distances
To measure the depth estimation performance of the proposed lightweight efficient neural network (L-ENet), experiments were conducted using the NYU Depth v2 dataset [36], which consists of images captured in indoor environments, and the KITTI dataset [37], which contains images captured in outdoor environments

Summary

Introduction

Depth estimation from objects or scenes has been studied for a long time in the computer vision field and has been applied in various applications, such as 3D modeling, computer graphics, virtual reality, augmented reality, and autonomous driving. There are two general methods used to obtain a depth estimation from an object or real-world scene, namely, passive and active, both of which are extremely popular. Intrinsic, and extrinsic characteristics of the cameras provide the depth of the scenes and 3D real-world coordinates [1]. This method uses 3D depth information predicted by applying stereo matching to images obtained through the simultaneous capturing of objects using two or more cameras. It is difficult to generate accurate depth information when the distance between the cameras and the angle are misaligned [2,3]

Methods

Results

Discussion

Conclusion