Abstract

Deployment of depth estimating models in computer vision applications like obstacle avoidance, scene reconstruction, and camera pose estimation has become a fundamental task in the present day. Most of the depth images generated by existing monocular depth estimation models have blurry approximations of the depth and resolution, especially in low-textured regions. However, the existing depth estimation models that provide better predictions of depth generally take multiple images of the same scene as an input for generating depth maps. This paper presents a transfer learning approach with densely connected convolutional neural networks that take only an RGB image as an input for deeper and high-quality prediction of depth. In the proposed solution, an encoder-decoder architecture is leveraged for extracting features from an RGB image and generating a depth map for that corresponding RGB image. The densely connected convolutional network with 161 layers (Densenet-161) is used as an encoder, and the decoder is made up of five upsampling blocks and one transposed convolutional layer. The evaluation results obtained after training the model on the benchmark NYU V2 depth dataset ( Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor Segmentation and Support Inference from RGBD Images. In: Proceedings of the IEEE Conference on Computer Vision, pp. 746-760(2012).) have shown that the proposed approach for monocular depth estimation with the encoder-decoder architecture of lesser complexity, trained with fewer parameters and iterations, performs better than the existing state-of-the-art techniques as the value of the average Root Mean Square Error (RMSE) between the predicted and ground-truth depths, calculated in the evaluation phase of the proposed approach, is 0.505 which is less than the RMSE values obtained in case of all the mentioned monocular depth estimation methods. Furthermore, the depth maps generated by the proposed model have a good quality resolution and have minimal effect on the surrounding conditions like the texture of the walls, illumination effects, etc.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call