Abstract

Predicting depth is crucial to understand the 3D geometry of a scene. While, “for stereo images, local correspondence suffices for estimation, finding depth from a single image is less straightforward, requiring integration of both global and local information”. In this chapter, we deal with depth estimation using Convolutional Neural Network (CNN) using transfer learning as a baseline. We propose a not so deep CNN, which uses transfer learning to extract low-level features. We employ a fully convolutional architecture, which first extracts low-level image features by pre-trained ResNet-50 and VGG19 network. We do transfer learning by using the initial layers of ResNet-50 and VGG19, which are connected in parallel without downsampling anywhere in the proposed architecture to recover the size of the depth map. This CNN network is able to do inference in real-time on images because it is not such a deep architecture. In this study, we have compared our approach with the pure CNN network, which illustrates the effectiveness of transfer learning. We demonstrate that our method of deep CNN with transfer learning yields a better result than one pure CNN and is faster to converge. Moreover, we have shown the influence of loss functions on training. We have compared the qualitative visualization and quantitative metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call