Abstract

Understanding the 3D perspective of a scene is imperative in improving the precision of intelligent autonomous systems. The difficulty in understanding is compounded when only one image of the scene is available at disposal. In this regard, we propose a fully convolutional deep framework for predicting the depth map and surface normal from a single RGB image in a common architecture. The DenseNet CNN architecture is employed to learn the complex mapping between an input RGB image and its corresponding 3D primitives. We introduce a novel approach of multi-stage cascaded deconvolution, where the output feature maps of one dense block are reused by concatenating with the feature maps of the corresponding deconvolution block. These combined feature maps are progressed along the deep network in a pre-activated manner to construct the final output. The network is trained separately for estimating depth and surface normal while keeping the architecture same. The suggested architecture, compared to the counterparts, uses fewer training samples and model parameters. Exhaustive experiments on benchmark dataset not only reveal the efficacy of the proposed multi-stage scheme over the one-way sequential deconvolution but also outperform the state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call