Fine-grained visual categorization (FGVC) is a challenging task because there are many hard examples existing between fine-grained classes which differ subtly in particular local regions. To address this issue, many methods have recourse to high-resolution source images and others adopt effective regularization like "mixup" or "between class learning." Despite their promising achievements, mixup tends to cause the manifold intrusion problem which would result in under-fitting and degradation of the model performance and high-resolution input inevitably leads to high computational costs. In view of this, we present a multiresolution discriminative mixup network (MRDMN). Different from standard mixup, the proposed discriminative mixup strategy mixes discriminative regions linearly instead of entire images to avoid manifold intrusion, which makes it learn the local detail features more effectively and contributes to more precise categorization. Furthermore, an innovative resolution-based distillation strategy is designed to transfer the multiresolution detail feature representations to a low-resolution network, which speeds up the testing and boosts the categorization accuracy simultaneously. Extensive experiments demonstrate that our proposed MRDMN remarkably outperforms most competitive approaches with less computation time on the CUB-200-2011, Stanford-Cars, Stanford-Dogs, Food-101, and iNaturalist 2017 datasets. The codes are in https://github.com/aztc/MRDMN.
Read full abstract7-days of FREE Audio papers, translation & more with Prime
7-days of FREE Prime access