Abstract

Scene recognition is an image recognition problem aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the unique traits of the scene that the 'sceneness' spreads and the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14% on Places 2 and 90.37% on MIT indoor 67. The second highest performance of 77.28% is obtained on SUN 397.

Highlights

  • Scene recognition is one of the most spotlighted topics in image recognition, applied to image retrieval, autonomous robot, and drone

  • The domain difference between object and scene is not taken into consideration. (P2) Second, the standard cross-entropy loss function is not enough for scene recognition, since scene recognition is quite different from general image recognition: A scene spreads all over the image, and the class of the scene does not change over the entire image

  • In this paper, a new scene recognition framework named FOSNet has been proposed, in which the object and the scene information have been combined in a trainable fusion module named correlative context gating (CCG)

Read more

Summary

INTRODUCTION

Scene recognition is one of the most spotlighted topics in image recognition, applied to image retrieval, autonomous robot, and drone. (P2) Second, the standard cross-entropy loss function is not enough for scene recognition, since scene recognition is quite different from general image recognition: A scene spreads all over the image, and the class of the scene does not change over the entire image Proposed network is named as FOSNet since it is based on the effective fusion of the object and the scene information in the given image. To solve the problem (P1), a new object-scene fusion framework named correlative context gating (CCG) is developed. The contributions of FOSNet are as follows: 1) A new fusion framework named CCG is proposed to combine the object and scene features from the image.

RELATED WORKS
FUSION OF OBJECT FEATURE AND SCENE FEATURE
EXPERIMENTS
ABLATION STUDY
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.