Abstract

In several applications, such as scene interpretation and reconstruction, precise depth measurement from images is a significant challenge. Current depth estimate techniques frequently provide fuzzy, low-resolution estimates. With the use of transfer learning, this research executes a convolutional neural network for generating a high-resolution depth map from a single RGB image. With a typical encoder-decoder architecture, when initializing the encoder, we use features extracted from high-performing pre-trained networks, as well as augmentation and training procedures that lead to more accurate outcomes. We demonstrate how, even with a very basic decoder, our approach can provide complete high-resolution depth maps. A wide number of deep learning approaches have recently been presented, and they have showed significant promise in dealing with the classical ill-posed issue. The studies are carried out using KITTI and NYU Depth v2, two widely utilized public datasets. We also examine the errors created by various models in order to expose the shortcomings of present approaches which accomplishes viable performance on KITTI besides NYU Depth v2.

Highlights

  • Commercial depth sensors, such as different LiDAR systems including Kinects, are often used to acquire depth information

  • Since the task is closely related to semantic labeling, most of the works are built upon ImageNet Large Scale Visual Recognition Challenge (ILSVRC), often initializing their networks with AlexNet or the deeper VGG

  • Auto Augmentation is pre-learned with sub-policies on ImageNet

Read more

Summary

Introduction

Commercial depth sensors, such as different LiDAR systems including Kinects, are often used to acquire depth information. Techniques depended solely on hand-crafted features including probabilistic models, whereas current approaches have largely relied on convolutional neural networks (CNNs) because of their superior performance. Eigen et al [ 2 ] are the first to use CNN to guess the depth of one image They trained a multi scale convolutional neural network (CNN) to measure depth using Deep Learning techniques for this type of challenge. Zhao Chen et al.[4] suggested a method for overcoming the limits of monocular depth estimation by giving a depth model a sparse amount of recorded depth and an RGB image to predict the entire depth map. The design of smaller and more energy-efficient depth sensor hardware is preferred Because of this challenge, many computer vision systems use RGB images obtained from video stream for camera pose estimation.

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.