Abstract

Zero-shot visual recognition aims to classify images whose classes are not have been seen in the training stage. Most of current approaches first project the attribute semantic feature and image feature into a common space, then the manually defined similarity measures or linear classifiers are used to recognize the unseen image. Different from the existing study, in this paper, we propose a novel Semantic Attention-based Compare Network (SACN) which is comprised of a visual feature extraction module, a feature fusion module and a similarity compare module. Our SACN has several advantages: (1) In visual feature extraction module, a convolutional neural network (CNN) with multi-losses is introduced to extract more distinguishing visual features. (2) In feature fusion module, an attribute semantic attention mechanism is proposed to associate visual feature and class semantic representation. (3) In similarity compare module, a distance compare network is presented to predict the similarities of the test image with all unseen classes. To evaluate the proposed model, we have conducted extensive experiments on two widely used benchmarks, and both qualitative and quantitative evaluation results have demonstrated the effectiveness of the proposed SACN. More importantly, the proposed method is the 2nd place solution to the AI Challenger 2018 (Global AI Contest).

Highlights

  • With the development of deep learning, deep neural networks have become the most state-of-the-art (SOTA) among various visual tasks [1]–[4]

  • Zero-shot visual recognition aims to learn a model from training images and their corresponding semantic representations, which can predict the class of the test image that belongs to one of the unseen classes

  • A distance compare network is presented to predict the similarities of the test image with all unseen classes

Read more

Summary

INTRODUCTION

With the development of deep learning, deep neural networks have become the most state-of-the-art (SOTA) among various visual tasks [1]–[4]. Zero-shot visual recognition aims to learn a model from training images (which label belongs to seen classes) and their corresponding semantic representations, which can predict the class of the test image that belongs to one of the unseen classes. Works [5]–[8] of zero-shot visual recognition utilize the attributes as side information and infer the class of the test image via a two-stage approach. Given an image, they first predict its attributes, deem the class which has the most similar attributes as its label.

RELATED WORK
HYBRID MODEL-BASED ZERO-SHOT VISUAL RECOGNITION
PROBLEM STATEMENT
SIMILARITY COMPARE
COST FUNCTION
EXPERIMENTS
CONCLUSION
FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.