Abstract

Zero-Shot Learning (ZSL) aims to identify categories that are never seen during training. There are many ZSL methods available, and the number is steadily increasing. Even then, there are still some issues to be resolved, such as class embedding and image functions. Human-annotated attributes have been involved in recent work on class embedding. However, this type of attribute does not adequately represent the semantic and visual aspects of each class, and these annotating attributes are time-consuming. Furthermore, ZSL methods for extracting image features rely on the development of pre-trained image representations or fine-tuned models, focusing on learning appropriate functions between image representations and attributes. To reduce the dependency on manual annotation and improve the classification effectiveness, we believe that ZSL would benefit from using Contrastive Language-Image Pre-Training (CLIP) or combined with manual annotation. For this purpose, we propose an improved ZSL model named UBZSL. It uses CLIP combined with manual annotation as a class embedding method and uses an attention map for feature extraction. Experiments show that the performance of our ZSL model on the CUB dataset is greatly improved compared to the current model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call