MMSNet: Multi-modal scene recognition using multi-scale encoded features

Ali Caglayan,Nevrez Imamoglu,Ryosuke Nakamura

doi:10.1016/j.imavis.2022.104453

Abstract

Utilizing multi-level features has been proven to improve RGB-D scene recognition performance. However, simply fusing features after conducting RGB and depth data separately may not satisfy multi-modal integrity. In this work, we propose an effective multi-modal RGB-D scene recognition model that integrates global or local multi-scale/multi-semantic features. The proposed approach is built on two key components. In the first stage, multiple random recursive neural networks (RNNs) are employed on a baseline CNN model to obtain multi-scale encoded features from multi-level feature hierarchy. In the second stage, multi-layer perceptrons (MLPs) learn global/local features at multiple levels while encouraging the correlation of multi-modal mutual features. Our learning design is based on the insight that correlated multi-modal features provide the complementary relation between the two modalities that promotes better performance of RGB-D scene recognition. In addition, the network is trained using a decisive fusion based on modality prediction confidence weights to yield RGB-D multi-modal recognition. Experiments on three RGB-D scene datasets verify the effectiveness of the proposed approach by achieving superior or highly competitive results compared to other state-of-the-art counterpart methods. Evaluation code and models are available at https://github.com/acaglayan/MMSNet.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MMSNet: Multi-modal scene recognition using multi-scale encoded features

Abstract

Talk to us

Similar Papers

More From: Image and Vision Computing

Lead the way for us

Journal: Image and Vision Computing	Publication Date: Apr 15, 2022
Citations: 3

Similar Papers

RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning
Zhitong Xiong ... Qi Wang
IEEE Access | VOL. 7
Zhitong Xiong, et. al.Zhitong Xiong ... Qi Wang
01 Jan 2019
IEEE Access | VOL. 7

ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition
Yuan Yuan ... Zhitong Xiong
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33
Yuan Yuan, et. al.Yuan Yuan ... Zhitong Xiong
17 Jul 2019
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 33

ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition.
Zhitong Xiong ... Qi Wang
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 30
Zhitong Xiong, et. al.Zhitong Xiong ... Qi Wang
01 Jan 2020
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society | VOL. 30

Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition
Hongyuan Zhu ... Shijian Lu
-
Hongyuan Zhu, et. al.Hongyuan Zhu ... Shijian Lu
01 Jun 2016
01 Jun 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MMSNet: Multi-modal scene recognition using multi-scale encoded features

Abstract

Talk to us

Similar Papers

More From: Image and Vision Computing