FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

Hongje Seong,Euntai Kim,Junhyuk Hyun

doi:10.1109/access.2020.2989863

Hongje Seong, Euntai Kim + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.2989863

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 46	License type: CC BY 4.0

Affiliation: Yonsei University

Abstract

Scene recognition is an image recognition problem aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the unique traits of the scene that the 'sceneness' spreads and the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14% on Places 2 and 90.37% on MIT indoor 67. The second highest performance of 77.28% is obtained on SUN 397.

Highlights

Scene recognition is one of the most spotlighted topics in image recognition, applied to image retrieval, autonomous robot, and drone
The domain difference between object and scene is not taken into consideration. (P2) Second, the standard cross-entropy loss function is not enough for scene recognition, since scene recognition is quite different from general image recognition: A scene spreads all over the image, and the class of the scene does not change over the entire image
In this paper, a new scene recognition framework named FOSNet has been proposed, in which the object and the scene information have been combined in a trainable fusion module named correlative context gating (CCG)

Summary

INTRODUCTION

Scene recognition is one of the most spotlighted topics in image recognition, applied to image retrieval, autonomous robot, and drone. (P2) Second, the standard cross-entropy loss function is not enough for scene recognition, since scene recognition is quite different from general image recognition: A scene spreads all over the image, and the class of the scene does not change over the entire image Proposed network is named as FOSNet since it is based on the effective fusion of the object and the scene information in the given image. To solve the problem (P1), a new object-scene fusion framework named correlative context gating (CCG) is developed. The contributions of FOSNet are as follows: 1) A new fusion framework named CCG is proposed to combine the object and scene features from the image.

RELATED WORKS

FUSION OF OBJECT FEATURE AND SCENE FEATURE

EXPERIMENTS

ABLATION STUDY

Findings

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Author response: Invariant representation of physical stability in the human brain
RT Pramod ... Nancy Kanwisher
-
RT Pramod, et. al.RT Pramod ... Nancy Kanwisher
09 Feb 2022
09 Feb 2022

A Bayesian Deep CNN Framework for Reconstructing k-t Undersampled Resting-fMRI

-

29 Dec 2020
29 Dec 2020

Decision letter: Causal neural mechanisms of context-based object recognition
Redmond G O'Connell ... Joshua I Gold
-
Redmond G O'Connell, et. al.Redmond G O'Connell ... Joshua I Gold
03 Jun 2021
03 Jun 2021

Object Detection in High Resolution Remote Sensing Imagery Based on Convolutional Neural Networks With Suitable Object Scale Features
Zhipeng Dong ... Zhiqi Zhang
IEEE Transactions on Geoscience and Remote Sensing | VOL. 58
Zhipeng Dong, et. al.Zhipeng Dong ... Zhiqi Zhang
26 Dec 2019
IEEE Transactions on Geoscience and Remote Sensing | VOL. 58

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access