MSN: Modality separation networks for RGB-D scene recognition

Zhitong Xiong,Yuan Yuan,Qi Wang

doi:10.1016/j.neucom.2019.09.066

Abstract

RGB-D image based indoor scene recognition is a challenging task due to the complex scene layouts and cluttered objects. Although the depth modality can provide extra geometric information, how to better learn the multi-modal features is still an open problem. Considering this, in this paper we propose the modality separation networks to extract the modal-consistent and modal-specific features simultaneously. The motivations of this work are from two aspects: 1) The first one is to learn what is unique to each modality and what is common between the two modalities explicitly; 2) The second one is to explore the relationship between global/local features and modal-specific/consistent features. To this end, the proposed framework contains two branches of submodules to learn the multi-modal features. One branch is used to extract the individual characteristics of each modality by minimizing the similarity between two modalities. Another branch is to learn the common information between two modalities by maximizing the correlation term. Moreover, with the spatial attention module, our method can visualize the spatial positions where different submodules focus on. We evaluate our method on two public RGB-D scene recognition datasets, and new state-of-the-art results are achieved with the proposed framework.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MSN: Modality separation networks for RGB-D scene recognition

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Sep 25, 2019
Citations: 23

Similar Papers

RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning
Zhitong Xiong ... Qi Wang
IEEE Access | VOL. 7
Zhitong Xiong, et. al.Zhitong Xiong ... Qi Wang
01 Jan 2019
IEEE Access | VOL. 7

ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition.
Zhitong Xiong ... Qi Wang
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 30
Zhitong Xiong, et. al.Zhitong Xiong ... Qi Wang
01 Jan 2020
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 30

One Spatio-Temporal Sharpening Attention Mechanism for Light-Weight YOLO Models Based on Sharpening Spatial Attention.
Mengfan Xue ... Minghao Chen
Sensors | VOL. 21
Mengfan Xue, et. al.Mengfan Xue ... Minghao Chen
28 Nov 2021
Sensors | VOL. 21

ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition
Yuan Yuan ... Zhitong Xiong
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Yuan Yuan, et. al.Yuan Yuan ... Zhitong Xiong
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MSN: Modality separation networks for RGB-D scene recognition

Abstract

Talk to us

Similar Papers

More From: Neurocomputing