Abstract

Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.

Highlights

  • Semantic segmentation and depth information are intrinsically related, and both pieces of information need to be considered in an integrated manner to succeed in challenging applications, such as robotics [1] or autonomous navigation [2]

  • The performance of HybridNet A1 is even worse than the single-task method DeepLab-ASPP, which indicates that the idea of benefiting from unifying two single tasks in a hybrid architecture can hardly be achieved by sharing the feature-extraction process in more complex indoor scenes

  • We have introduced a methodology for depth estimation and semantic segmentation from a single image using a unified convolutional network

Read more

Summary

Introduction

Semantic segmentation and depth information are intrinsically related, and both pieces of information need to be considered in an integrated manner to succeed in challenging applications, such as robotics [1] or autonomous navigation [2]. Deep learning techniques have shown extraordinary success in both tasks [3] in recent years In this context, the feature-extraction process for a specific task is modeled as a parameter estimation problem in Convolutional Neural Networks (CNNs) which is based on a set of training data. Sensors 2019, 19, 1795 segmentation into a sole structure is motivated by the fact that both segmentation information and depth maps represent geometrical information of a scene. In this manner, the feature extractors can be better trained due to the enriched prior knowledge. One of the main advantages of the proposed approach is the straightforward manner semantic segmentation and depth map are estimated from a single image, providing a feasible solution to these problems

Related Work
Semantic Segmentation
Depth Estimation
Multi-Task Approaches
Our Proposal
Hybrid Convolutional Framework
Depth Estimation Network
Semantic Segmentation Network
Training Details
Experiments
Road Scene
Indoor Scene
Comparison with Other Hybrid Architectures
Findings
Conclusions and Future Work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call