Abstract

Scene images generally show the characteristics of large intra-class variety and high inter-class similarity because of complicated appearances, subtle differences, and ambiguous categorization. Hence, it is difficult to achieve satisfactory accuracy by using a single representation. For solving this issue, we present a comprehensive representation for scene recognition by fusing deep features extracted from three discriminative views, including the information of object semantics, global appearance, and contextual appearance. These views show diversity and complementarity of features. The object semantics representation of the scene image, denoted by spatial-layout-maintained object semantics features, is extracted from the output of a deep-learning-based multi-classes detector by using spatial fisher vectors, which can simultaneously encode the category and layout information of objects. A multi-direction long short-term memory-based model is built to represent contextual information of the scene image, and the activation of the fully connected layer of a convolutional neural network is used to represent the global appearance of scene image. These three kinds of deep features are then fused to draw a final conclusion for scene recognition. Extensive experiments are conducted to evaluate the proposed comprehensive representation on three benchmarks scene image database. The results show that the three deep features complement to each other strongly and are effective in improving recognition performance after fusion. The proposed method can achieve scene recognition accuracy of 89.51% on the MIT67 database, 78.93% on the SUN397 database, and 57.27% on the Places365 databases, respectively, which are better percentages than the accuracies obtained by the latest reported deep-learning-based scene recognition methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.