Abstract
Estimating the depth map from a single RGB image is important to understand the nature of the terrain in robot navigation and has attracted considerable attention in the past decade. The existing approaches can accurately estimate the depth from a single RGB image, considering a highly structured environment. The problem becomes more challenging when the terrain is highly dynamic. We propose a fine-tuned generative adversarial network to estimate the depth map effectively for a given single RGB image. The proposed network is composed of a fine-tuned generator and a global discriminator. The encoder part of the generator takes input RGB images and depth maps and generates their joint distribution in the latent space. Subsequently, the decoder part of the generator decodes the depth map from the joint distribution. The discriminator takes real and fake pairs in three different configurations and then guides the generator to estimate the depth map from the given RGB image accordingly. Finally, we conducted extensive experiments with a highly dynamic environment dataset for verifying the effectiveness and feasibility of the proposed approach. The proposed approach could decode the depth map from the joint distribution more effectively and accurately than the existing approaches.
Highlights
Depth estimation from a single image has a long history owing to its application in computer vision and robot navigation, both for indoor and outdoor environments
Our approach aims to perform an accurate translation of the RGB images into their corresponding depth maps with a
We perform a comparative analysis of the proposed approach with the conditional generative adversarial network (cGAN)-based approach [31], BA-DualAE [29], consistent imageto-image translation network (CITN) [28], MSDN [13], and FCN [36], respectively, in terms of three different datasets including the RealSense depth dataset [37], Cityscapes [15], and NYU dataset [16]
Summary
Depth estimation from a single image has a long history owing to its application in computer vision and robot navigation, both for indoor and outdoor environments. Several recent studies have posed the task of monocular depth estimation as a supervised learning problem to overcome the limitations of the aforementioned approaches [12,13,14] These approaches attempt to regress the depth of each pixel in an image directly using network models that have been trained on a large amount of depth data. For a highly dynamic and uneven terrain, estimating depth from a single RGB image is a difficult task for unsupervisedregression-based approaches, as the corresponding target changes continuously. We propose a fine-tuned generator-based conditional adversarial network to address the problem of depth estimation while considering a dynamic terrain. A unified fine-tuned generator-based conditional adversarial network: To address the problem of translating a single RGB image to its corresponding depth map, we proposed a fine-tuned generatorbased conditional adversarial network that handles the dynamic environment effectively
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.