Abstract

Depth estimation and semantic segmentation are two fundamental tasks in scene understanding. These two tasks are usually solved separately, although they have complementary properties and are highly correlated. Jointly solving these two tasks is very beneficial for real-world applications that require both geometric and semantic information. Within this context, the paper presents a unified learning framework for generating a refined depth estimation map and semantic segmentation map given a single image. Specifically, this paper proposes a novel architecture called JDSNet. JDSNet is a Siamese triple decoder architecture that can simultaneously perform depth estimation, depth refinement, and semantic labeling of a scene from an image by exploiting the interaction between depth and semantic information. A semi-supervised method is used to train JDSNet to learn features for both tasks where geometry-based image reconstruction methods are employed instead of ground-truth depth labels for the depth estimation task while ground-truth semantic labels are required for the semantic segmentation task. This work uses the KITTI driving dataset to evaluate the effectiveness of the proposed approach. The experimental results show that the proposed approach achieves excellent performance on both tasks, and these indicate that the model can effectively utilize both geometric and semantic information.

Highlights

  • Scene understanding is crucial for autonomous driving systems since it provides a mechanism to understand the scene layout of the environment [1, 2]

  • Addressing depth estimation and semantic segmentation simultaneously where the two tasks can benefit from each other is non-trivial and is one of the most challenging tasks in computer vision given the peculiarities of each task and the limited information that can be obtained from monocular images

  • As shown in their work, the proposed model achieved comparable results on monocular depth estimation but outperformed the state-ofthe-art methods on semantic segmentation

Read more

Summary

Introduction

Scene understanding is crucial for autonomous driving systems since it provides a mechanism to understand the scene layout of the environment [1, 2]. For depth estimation, the semantic labels provide valuable prior knowledge to depict the geometric relationships between pixels of different classes and generate better scene layout [3, 4, 5, 6]. These two fundamental tasks in computer vision can be dealt with in an integrated manner under a unified framework that optimizes multiple objectives to improve computational efficiency and performance for both tasks from single RGB images. Addressing depth estimation and semantic segmentation simultaneously where the two tasks can benefit from each other is non-trivial and is one of the most challenging tasks in computer vision given the peculiarities of each task and the limited information that can be obtained from monocular images

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call