Abstract

Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e., zero-shot fine-grained classification. In the first feature learning phase, we finetune deep convolutional neural networks using hierarchical semantic structure among fine-grained classes to extract discriminative deep visual features. Meanwhile, a domain adaptation structure is induced into deep convolutional neural networks to avoid domain shift from training data to test data. In the second label inference phase, a semantic directed graph is constructed over attributes of fine-grained classes. Based on this graph, we develop a label propagation algorithm to infer the labels of images in the unseen classes. Experimental results on two benchmark datasets demonstrate that our model outperforms the state-of-the-art zero-shot learning models. In addition, the features obtained by our feature learning model also yield significant gains when they are used by other zero-shot learning models, which shows the flexility of our model in zero-shot fine-grained classification.

Highlights

  • Fine-grained image classification, which aims to recognize subordinate level categories, has emerged as a popular research area in the computer vision community[1,2,3,4,5]

  • Based on the semantic directed graph and the discriminative features obtained by our feature learning model, we develop a label propagation algorithm to infer the labels of images in the unseen classes

  • Experimental results demonstrate that the proposed model outperforms the state-of-the-art zero-shot learning models in the task of zero-shot finegrained classification

Read more

Summary

Introduction

Fine-grained image classification, which aims to recognize subordinate level categories, has emerged as a popular research area in the computer vision community[1,2,3,4,5]. Considering the lack of training data for every class in fine-grained classification, we can adopt zero-shot learning to recognize images from unseen classes without labelled training data. Conventional zero-shot learning algorithms mainly explore the semantic relationship among classes (using textual information) and attempt to learn a match between images and their textual descriptions[13,14,15]. In other words, this rarely works on zero-shot learning focus on feature learning. This rarely works on zero-shot learning focus on feature learning This is really bad for fine-grained classification, since it requires more discriminative features than general image recognition. We must focus on feature learning for zero-shot fine-grained image classification

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call