Abstract
We address the problem of depth estimation from a single monocular image in the paper. Depth estimation from a single image is an ill-posed and inherently ambiguous problem. In the paper, we propose an encoder-decoder structure with the feature pyramid to predict the depth map from a single RGB image. More specifically, the feature pyramid is used to detect objects of different scales in the image. The encoder structure aims to extract the most representative information from the original image through a series of convolution operations and to reduce the resolution of the input image. We adopt Res2-50 as the encoder to extract important features. The decoder section uses a novel upsampling structure to improve the output resolution. Then, we also propose a novel loss function that adds gradient loss and surface normal loss to the depth loss, which can predict not only the global depth but also the depth of fuzzy edges and small objects. Additionally, we use Adam as our optimization function to optimize our network and speed up convergence. Our extensive experimental evaluation proves the efficiency and effectiveness of the method, which is competitive with previous methods on the Make3D dataset and outperforms state-of-the-art methods on the NYU Depth v2 dataset.
Highlights
Estimating the dense and accurate depth of a scene from a single RGB image is one of the fundamental problems of computer vision and essential for various applications, such as scene understanding [1]–[4], 3D modeling [5], [6], robotics [7], [8], virtual reality [9], and autonomous driving [10]
Given the training set RGB image and the corresponding depth map of the image, depth prediction can be regarded as a pixel-level regression problem; that is, the model directly learns to predict the depth corresponding to each pixel in the single image
We propose a novel method for monocular depth estimation
Summary
Estimating the dense and accurate depth of a scene from a single RGB image is one of the fundamental problems of computer vision and essential for various applications, such as scene understanding [1]–[4], 3D modeling [5], [6], robotics [7], [8], virtual reality [9], and autonomous driving [10]. Estimating the depth from a single image is an ill-posed and inherent ambiguous problem. Previous studies have shown that depth estimation, similar to other pixel-level classification or regression tasks, can be performed using the convolutional neural networks (CNNs) model. We present a novel approach for estimating depth from a single image. M. Tang et al.: Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image the different methods used for depth estimation in the past.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have