Abstract

Discovering diverse significant regions (e.g., beaks and wings for some bird species) and extracting discriminative features from them is vitally important in fine-grained image recognition. Currently, the attention-based approaches present promising performance, which generally extract the fine-grained features by cropping or sampling significant parts. However, the cropping methods usually suffer from a fixed number of parts and difficulty to highlight irregular regions, and existing sampling methods may produce extremely distorted images. To effectively capture the fine-grained features, we propose an end-to-end separated smooth sampling network (SSSNet) in this paper. Specifically, we propose a separated smooth sampling module to highlight diverse significant regions of an image. Different from previous methods, we adopt smooth sampling on two separated coordinates to process images, which can effectively highlight discriminative contents and meanwhile avoid extreme distortion. We further propose an iterative masking method to embed into SSSNet, which can produce multiple attention maps without overlap to represent different significant regions. We conduct extensive experiments on CUB-200–2011, Stanford-Cars, and FGVC-Aircraft datasets. The results show the effectiveness of separated smooth sampling, and our SSSNet achieves better performance against previous state-of-the-art approaches under the same settings.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call