Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition

Hasan F.M Zaki,Faisal Shafait,Ajmal Mian

doi:10.1016/j.robot.2017.02.008

Abstract

Recognizing semantic category of objects and scenes captured using vision-based sensors is a challenging yet essential capability for mobile robots and UAVs to perform high-level tasks such as long-term autonomous navigation. However, extracting discriminative features from multi-modal inputs, such as RGB-D images, in a unified manner is non-trivial given the heterogeneous nature of the modalities. We propose a deep network which seeks to construct a joint and shared multi-modal representation through bilinearly combining the convolutional neural network (CNN) streams of the RGB and depth channels. This technique motivates bilateral transfer learning between the modalities by taking the outer product of each feature extractor output. Furthermore, we devise a technique for multi-scale feature abstraction using deeply supervised branches which are connected to all convolutional layers of the multi-stream CNN. We show that end-to-end learning of the network is feasible even with a limited amount of training data and the trained network generalizes across different datasets and applications. Experimental evaluations on benchmark RGB-D object and scene categorization datasets show that the proposed technique consistently outperforms state-of-the-art algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition

Abstract

Talk to us

Similar Papers

More From: Robotics and Autonomous Systems

Lead the way for us

Journal: Robotics and Autonomous Systems	Publication Date: Mar 10, 2017
Citations: 18

Similar Papers

CerCan·Net: Cervical cancer classification model via multi-layer feature ensembles of lightweight CNNs and transfer learning
Omneya Attallah
Expert Systems with Applications | VOL. 229
Omneya AttallahOmneya Attallah
01 Nov 2023
Expert Systems with Applications | VOL. 229

Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual Cortex
Astrid A Zeman ... Hans Op De Beeck
Scientific reports | VOL. 10
Astrid A Zeman, et. al.Astrid A Zeman ... Hans Op De Beeck
12 Feb 2020
Scientific reports | VOL. 10

Image Denoising With Deep Convolutional Neural and Multi-Directional Long Short-Term Memory Networks Under Poisson Noise Environments
Wuttipong Kumwilaisak ... Teerawat Piriyatharawet
IEEE Access | VOL. 8
Wuttipong Kumwilaisak, et. al.Wuttipong Kumwilaisak ... Teerawat Piriyatharawet
01 Jan 2020
IEEE Access | VOL. 8

Research on improved convolutional wavelet neural network
Jingwei Liu ... Jiaxin Li
Scientific Reports | VOL. 11
Jingwei Liu, et. al.Jingwei Liu ... Jiaxin Li
09 Sep 2021
Scientific Reports | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition

Abstract

Talk to us

Similar Papers

More From: Robotics and Autonomous Systems