Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification.

Radhakrishnan Gopalapillai,Deepa Gupta,Yousef Ajami Alotaibi,Mohammed Zakariah

doi:10.3390/s21237950

Radhakrishnan Gopalapillai, Deepa Gupta + Show 2 more

Open Access

https://doi.org/10.3390/s21237950

Copy DOI

Abstract

Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel’s local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.

Highlights

RGBD convolutional neural networks (CNN) with convolution-based encoding (CBE): RGBD CNN with added CBE layer was used in this setup
Experiments were performed with and without data augmentation using HHA encoding as well as convolution-based encoding When the RGBD CNN was used without data augmentation and depth images converted with HHA encoding, the of scene classification accuracy obtained was 54.7%
synthetic minority oversampling technique (SMOTE) oversampling was applied on features extracted at the output of the first dense layer of the trained RGBD CNN

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Classical scene categorization systems extract image features and use them as input to a classifier including support vector machines (SVM), random forest, etc., for classification. The success of these systems depends on the right choice of features relevant to the task. With the availability of large datasets with millions of images, convolutional networks are able to learn features relevant to the task at hand with high discriminative capability. The class imbalance problem seen in the SUN RGB-D image dataset is addressed by applying the SMOTE technique to the features extracted after training a deep convolutional network and using these features to retrain an ablated network.

Scene Classification Using Features Extracted

Scene Classification Using Neural Networks

Scene Recognition Using RGB-D Images

Class Balancing

Benchmark Dataset

Architecture of the Proposed Method

VGG Convolutional Network

Data Augmentation Module

Depth Encoding Module

SMOTE Oversampling and Fine-Tuning of Dense Layers

Experimental Setup

Dataset for Training and Validation

Ablation Study on VGG16-PlacesNet Configurations for Transfer Learning

Implementation of the Depth Encoding Module

Experimental Results and Analysis

Data Augmentation

Convolution-Based Encoding

Experimental Results with Oversampling

Comparison with Existing Methods

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Nov 28, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

SEMANTIC LABELING OF STRUCTURAL ELEMENTS IN BUILDINGS BY FUSING RGB AND DEPTH IMAGES IN AN ENCODER-DECODER CNN FRAMEWORK
D Iwaszczuk ... N A Gard
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLII-1
D Iwaszczuk, et. al.D Iwaszczuk ... N A Gard
26 Sep 2018
ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences | VOL. XLII-1

Semi-supervised Learning for RGB-D Object Recognition
Yanhua Cheng ... Tieniu Tan
-
Yanhua Cheng, et. al.Yanhua Cheng ... Tieniu Tan
01 Aug 2014
01 Aug 2014

RGB-D Object Recognition Using Deep Convolutional Neural Networks
Saman Zia ... Yucel Yemez
-
Saman Zia, et. al.Saman Zia ... Yucel Yemez
01 Oct 2017
01 Oct 2017

A Haze Removal Method Based on Additional Depth Information and Image Fusion
Tian Tian ... Bin Zhang
-
Tian Tian, et. al.Tian Tian ... Bin Zhang
17 Jul 2020
17 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)