Abstract

Single image depth estimation works fail to separate foreground elements because they can easily be confounded with the background. To alleviate this problem, we propose the use of a semantic segmentation procedure that adds information to a depth estimator, in this case, a 3D Convolutional Neural Network (CNN)—segmentation is coded as one-hot planes representing categories of objects. We explore 2D and 3D models. Particularly, we propose a hybrid 2D–3D CNN architecture capable of obtaining semantic segmentation and depth estimation at the same time. We tested our procedure on the SYNTHIA-AL dataset and obtained , which is an improvement of 0.14 points (compared with the state of the art of ) by using manual segmentation, and using automatic semantic segmentation, proving that depth estimation is improved when the shape and position of objects in a scene are known.

Highlights

  • Depth estimation from a single image consists of calculating the distance between the objects in an image to the user’s point of view

  • We explored 2D, 3D Convolutional Neural Networks (CNN) models, and a hybrid 2D–3D CNN model capable of obtaining semantic segmentation and depth estimation at the same time

  • We found that helping the CNN models with additional information such as One-Hot Encoded Semantic Segmentation, aids in separating objects, and obtaining a better depth estimation: Knowing the shape and position of objects in the scene, a CNN can estimate their depth distance with greater accuracy

Read more

Summary

Introduction

Depth estimation from a single image consists of calculating the distance between the objects in an image to the user’s point of view. This distance is calculated through a pair of images obtained from both eyes (binocular vision) by using the overlap between the field of view of both eyes [1]. Once we identify the objects in the image, we proceed to estimate depth using this information To carry out these objectives, we propose the use of 2D and 3D Convolutional Neural Networks (CNN) trained with a synthetic dataset, containing both semantic segmentation and depth information, as well to explore a hybrid 2D–3D CNN model capable of estimating depth from a single image, while at the same time, segment objects found in it

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call