Abstract

Fine-Grained Visual Categorization (FGVC) is a challenging task due to the large intra-subcategory and small inter-subcategory variances. Recent studies tackle this task through a weakly supervised manner without using the part annotation from the experts. Of those, methods based on bilinear pooling are one of the main categories for computing the interaction between deep features and have shown high effectiveness. However, these methods mainly focus on the correlation within one specific layer but largely ignore the high interactions between multiple layers. In this study, we argue that considering the high interaction between the features from multiple layers can help to learn more distinguishing fine-grained features. To this end, we propose a High-Order-Interaction (HOI) method for FGVC. In our HOI, an efficient cross-layer trilinear pooling is introduced to calculate the third-order interaction between three different layers. Third-order interactions of different combinations are then fused to form the final representation. HOI can produce more discriminative representations and be readily integrated with the two popular techniques, attention mechanism and triplet loss, to obtain superposed improvement. Extensive experiments conducted on four FGVC datasets show the great superiority of our method over bilinear-based methods and demonstrate that the proposed method achieves the state of the art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call