Abstract

Fine-grained recognition poses the challenge of discriminating categories with only small subtle visual differences, which can be easily overwhelmed by diverse appearance within categories. Conventional approaches generally locate discriminative parts and then recognize the part-based features. However, we find that tuning the effective receptive field (ERF) of the network to the task plays the key role, which enables significant regions to contribute more to the output. Inspired by the receptive field stimulation mechanism of the visual cortex, we propose a Dynamic Perception framework as a solution. Our framework adapts the ERF by considering the image space and the kernel space simultaneously. In the image space, the Spatial Selective Sampling module is adopted to enlarge informative regions locally. In the kernel space, Spatial Selective Kernel convolution is introduced to adapt different kernel sizes for regions of interest and backgrounds by embedding spatial attention in the multi-path convolution. Extensive experiments on challenging benchmarks, including CUB-200-2011, FGVC-Aircraft, and Stanford Cars, demonstrate that our method yields a performance boost over the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.