Indoor semantic segmentation based on Swin-Transformer

Yunping Zheng,Yuan Xu,Shiqiang Shu,Mudar Sarem

doi:10.1016/j.jvcir.2023.103991

Abstract

In recent years, with the rapid development of Transformer in the field of natural language processing, many researchers have realized its potential and gradually applied it to the field of computer vision, with a proliferation of theoretical approaches represented by Vision Transformer (ViT) and Data-efficient image Transformer (DeiT). On the basis of ViT, the famous Swin-Transformer was proposed as one of the best computer vision neural network backbones, which can be widely used in tasks such as image classification, target detection and video recognition. However, in the field of image segmentation, the semantic segmentation of indoor scenes is still very challenging due to the wide variety of objects, large differences in object sizes, and a large number of overlapping objects with occlusion. Aiming at the problem that the existing semantic segmentation of RGB-D indoor scenes cannot effectively fuse multimodal features, in this paper, we propose a novel indoor semantic segmentation algorithm based on Swin-Transformer. It attempts to apply Swin-Transformer to the field of indoor RGBD semantic segmentation, and tests the performance of the model by conducting extensive experiments on the mainstream indoor semantic segmentation datasets NYU-Depth V2 and SUN RGB-D. The experimental results show that the Swin-L RGB+Depth setting achieves 52.44% MIoU on the NYU-Depth V2 data and 51.15% MIoU on the SUN RGB-D data set, which reflects an excellent performance in the field of indoor semantic segmentation. The improved performance of the Depth features on the indoor semantic segmentation model has also been demonstrated in the experiments by controlling the type of input features. Our source code is publicly available at https://github.com/YunpingZheng/ISSSW.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Indoor semantic segmentation based on Swin-Transformer

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation

Lead the way for us

Similar Papers

The Network of Attention-Aware Multimodal fusion for RGB-D Indoor Semantic Segmentation Method
Qiankun Zhao ... Yingcai Wan
-
Qiankun Zhao, et. al.Qiankun Zhao ... Yingcai Wan
15 Aug 2022
15 Aug 2022

Learning deep cross-scale feature propagation for indoor semantic segmentation
Linxi Huan ... Jianya Gong
ISPRS Journal of Photogrammetry and Remote Sensing | VOL. 176
Linxi Huan, et. al.Linxi Huan ... Jianya Gong
21 Apr 2021
ISPRS Journal of Photogrammetry and Remote Sensing | VOL. 176

AGWNet: Attention-guided adaptive shuffle channel gate warped feature network for indoor scene RGB-D semantic segmentation
Bing Xiong ... Wenjian Qin
Displays | VOL. 83
Bing Xiong, et. al.Bing Xiong ... Wenjian Qin
30 Apr 2024
Displays | VOL. 83

3D semantic segmentation based on spatial-aware convolution and shape completion for augmented reality applications
Yun-Chih Guo ... Li-Chen Fu
Computer Vision and Image Understanding | VOL. 224
Yun-Chih Guo, et. al.Yun-Chih Guo ... Li-Chen Fu
05 Sep 2022
Computer Vision and Image Understanding | VOL. 224

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Indoor semantic segmentation based on Swin-Transformer

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation