Abstract

The fine-grained visual classification (FGVC) which aims to distinguish subtle differences among subcategories is an important computer vision task. However, one issue that limits model performance is the problem of diversity within subcategories. To this end, we propose a simple yet effective approach named category similarity-based distributed labeling (CSDL) to tackle this problem. Specifically, we first obtain the feature centers for various subcategories and utilize them to initialize the label distributions. Then we replace the ground-truth labels in a Deep Neural Network (DNN) with the distributed labels to calculate the loss and perform the optimization. Finally, the joint supervision of a softmax loss and a center loss is adopted to update the parameters of the DNN, the deep feature centers, and the distributed labels for learning discriminative deep features. Comprehensive experiments on three publicly available FGVC datasets demonstrate the superiority of our proposed approach.

Highlights

  • Distinguishing subtle differences among fine-grained categories is an extremely difficult computer vision task

  • The distinction between fine-grained visual classification (FGVC) and traditional visual classification (e.g., ImageNet [60] categorization) lies in two aspects: (i) subcategories are visually similar and harder to distinguish, and (ii) there are fewer training samples for FGVC and the training set might not be representative of the practical scenario

  • We propose a simple yet effective weakly supervised approach, namely category similarity-based distributed labeling (CSDL), whose main idea is to (1) adopt the center loss to promote feature compactness and obtain class centers; (2) perform distributed labeling based on the feature similarity between class centers to mitigate overconfident predictions; (3) dynamically update the distributed labels throughout the whole training process

Read more

Summary

Introduction

Distinguishing subtle differences among fine-grained categories (e.g., different kinds of birds [3], aircrafts [9], or cars [10]) is an extremely difficult computer vision task. It is challenging to identify subtle differences among fine-grained subcategories, even for an expert with specific knowledge. This is because subcategories are visually similar to each other. Both "Caspian Tern" and "Artic Tern" have a white head with a black cap, a white neck, and gray wings. These subcategories are difficult to distinguish for a non-expert because they share a similar global appearance and can only be differentiated by subtle differences in small regions. The distinction between FGVC and traditional visual classification (e.g., ImageNet [60] categorization) lies in two aspects: (i) subcategories are visually similar and harder to distinguish, and (ii) there are fewer training samples for FGVC and the training set might not be representative of the practical scenario

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call