Multi-view deep learning for consistent semantic mapping with RGB-D cameras

Lingni Ma,Jorg Stuckler,Christian Kerl,Daniel Cremers

doi:10.1109/iros.2017.8202213

Abstract

Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel approach to object-class segmentation from multiple RGB-D views using deep learning. We train a deep neural network to predict object-class semantics that is consistent from several view points in a semi-supervised way. At test time, the semantics predictions of our network can be fused more consistently in semantic keyframe maps than predictions of a network trained on individual views. We base our network architecture on a recent single-view deep learning approach to RGB and depth fusion for semantic object-class segmentation and enhance it with multi-scale loss minimization. We obtain the camera trajectory using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth annotated frames in order to enforce multi-view consistency during training. At test time, predictions from multiple views are fused into keyframes. We propose and analyze several methods for enforcing multi-view consistency during training and testing. We evaluate the benefit of multi-view consistency training and demonstrate that pooling of deep features and fusion over multiple views outperforms single-view baselines on the NYUDv2 benchmark for semantic segmentation. Our end-to-end trained network achieves state-of-the-art performance on the NYUDv2 dataset in single-view segmentation as well as multi-view semantic fusion.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-view deep learning for consistent semantic mapping with RGB-D cameras

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Pediatric scaphoid fracture: diagnostic performance of various radiographic views.
Jie C Nguyen ... Summer L Kaplan
Emergency Radiology | VOL. 28
Jie C Nguyen, et. al.Jie C Nguyen ... Summer L Kaplan
15 Jan 2021
Emergency Radiology | VOL. 28

Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation
Xiao Fu ... Lanyun Zhu
-
Xiao Fu, et. al.Xiao Fu ... Lanyun Zhu
01 Sep 2022
01 Sep 2022

Acquisition, compression and rendering of depth and texture for multi-view video

-

18 Nov 2015
18 Nov 2015

Relevancy between Objects Based on Common Sense for Semantic Segmentation
Jun Zhou ... Xing Bai
Applied Sciences | VOL. 12
Jun Zhou, et. al.Jun Zhou ... Xing Bai
11 Dec 2022
Applied Sciences | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-view deep learning for consistent semantic mapping with RGB-D cameras

Abstract

Talk to us

Similar Papers