Abstract

The gradients of CNN are traditionally utilized for optimization and visualization. In this paper, we find that a discriminative representation hides in the gradients of convolution filters. Based on this, we propose a corresponding feature extraction and aggregation method for fine-grained image retrieval (FGIR). Firstly, we propose a metric to evaluate manually-designed loss functions and design a loss function originating from Grad-CAM in the testing phase based on it to extract the gradients of the convolution filters. Secondly, we take the gradients as the new features and design a succinct approach to aggregate them into a compact vector, which is named as Convolution Filters Gradient Aggregation (CFGA) feature. CFGA features can be extracted from pre-trained and fine-tuned CNN models. Extensive experiments are conducted on FGIR to verify the effectiveness of our proposed CFGA approach, compared with five supervised state-of-the-art methods and two unsupervised methods on two standard fine-grained retrieval datasets. Moreover, we generalize the CFGA method designed for CNN to Swin Transformer, and propose the Transformer parameter gradients aggregation (TPGA) method, which proves the applicability of the core idea of CFGA/TPGA to mainstream feature extraction models. We achieve state-of-the-art FGIR performance on CUB-200-2011 dataset and CARS196 dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.