Abstract

On account of a large scale of dataset need to be annotated to fit for specific tasks, Zero-Shot Learning(ZSL) has invoked so much attention and got significant progress in recent research due to the prevalence of deep neural networks. At present, ZSL is mainly solved through the utilization of auxiliary information, such as semantic attributes and text descriptions. And then, we can employ the mapping method to bridge the gap between visual and semantic space. However, due to the lack of effective use of auxiliary information, this problem has not been solved well. Inspired by previous work, we consider that visual space can be used as the embedding space to get a stronger ability to express the precise characteristics of semantic information. Meanwhile, we take into account that there are some noise attributes in the annotated information of public datasets that need to be processed. Based on these considerations, we propose an end-to-end method with convolutional architecture, instead of conventionally linear projection, to provide a deep representation for semantic information to solve ZSL. Semantic features would express more detailed and precise information after being feed into our method. Besides, we use word embedding to generate some superclasses for original classes and propose a new loss function for these superclasses to assist in training. Experiments show that our method can get decent improvements for ZSL and Generalized Zero-Shot Learning(GZSL) on several public datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call