Abstract

Estimating the depth map from a single RGB image is important to understand the nature of the terrain in robot navigation and has attracted considerable attention in the past decade. The existing approaches can accurately estimate the depth from a single RGB image, considering a highly structured environment. The problem becomes more challenging when the terrain is highly dynamic. We propose a fine-tuned generative adversarial network to estimate the depth map effectively for a given single RGB image. The proposed network is composed of a fine-tuned generator and a global discriminator. The encoder part of the generator takes input RGB images and depth maps and generates their joint distribution in the latent space. Subsequently, the decoder part of the generator decodes the depth map from the joint distribution. The discriminator takes real and fake pairs in three different configurations and then guides the generator to estimate the depth map from the given RGB image accordingly. Finally, we conducted extensive experiments with a highly dynamic environment dataset for verifying the effectiveness and feasibility of the proposed approach. The proposed approach could decode the depth map from the joint distribution more effectively and accurately than the existing approaches.

Highlights

  • Depth estimation from a single image has a long history owing to its application in computer vision and robot navigation, both for indoor and outdoor environments

  • Our approach aims to perform an accurate translation of the RGB images into their corresponding depth maps with a

  • We perform a comparative analysis of the proposed approach with the conditional generative adversarial network (cGAN)-based approach [31], BA-DualAE [29], consistent imageto-image translation network (CITN) [28], MSDN [13], and FCN [36], respectively, in terms of three different datasets including the RealSense depth dataset [37], Cityscapes [15], and NYU dataset [16]

Read more

Summary

INTRODUCTION

Depth estimation from a single image has a long history owing to its application in computer vision and robot navigation, both for indoor and outdoor environments. Several recent studies have posed the task of monocular depth estimation as a supervised learning problem to overcome the limitations of the aforementioned approaches [12,13,14] These approaches attempt to regress the depth of each pixel in an image directly using network models that have been trained on a large amount of depth data. For a highly dynamic and uneven terrain, estimating depth from a single RGB image is a difficult task for unsupervisedregression-based approaches, as the corresponding target changes continuously. We propose a fine-tuned generator-based conditional adversarial network to address the problem of depth estimation while considering a dynamic terrain. A unified fine-tuned generator-based conditional adversarial network: To address the problem of translating a single RGB image to its corresponding depth map, we proposed a fine-tuned generatorbased conditional adversarial network that handles the dynamic environment effectively

Joint latent space
Stability of the training
RELATED WORK
PROPOSED APPROACH
NETWORK ARCHITECTURE AND TRAINING
RESULTS
ANALYSIS BASED ON REALSENSE DEPTH DATASET
ANALYSIS BASED ON NYU DATASET
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.