Abstract
Scene recognition is an image recognition problem aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the unique traits of the scene that the 'sceneness' spreads and the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14% on Places 2 and 90.37% on MIT indoor 67. The second highest performance of 77.28% is obtained on SUN 397.
Highlights
Scene recognition is one of the most spotlighted topics in image recognition, applied to image retrieval, autonomous robot, and drone
The domain difference between object and scene is not taken into consideration. (P2) Second, the standard cross-entropy loss function is not enough for scene recognition, since scene recognition is quite different from general image recognition: A scene spreads all over the image, and the class of the scene does not change over the entire image
In this paper, a new scene recognition framework named FOSNet has been proposed, in which the object and the scene information have been combined in a trainable fusion module named correlative context gating (CCG)
Summary
Scene recognition is one of the most spotlighted topics in image recognition, applied to image retrieval, autonomous robot, and drone. (P2) Second, the standard cross-entropy loss function is not enough for scene recognition, since scene recognition is quite different from general image recognition: A scene spreads all over the image, and the class of the scene does not change over the entire image Proposed network is named as FOSNet since it is based on the effective fusion of the object and the scene information in the given image. To solve the problem (P1), a new object-scene fusion framework named correlative context gating (CCG) is developed. The contributions of FOSNet are as follows: 1) A new fusion framework named CCG is proposed to combine the object and scene features from the image.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.