Abstract
While transformers have indeed improved image retrieval accuracy in computer vision, challenges persist, including insufficient and imbalanced feature extraction and the inability to create compact binary codes. This study introduces a novel approach for image retrieval called Vision Transformer with Deep Hashing (VTDH), combining a hybrid neural network and optimized metric learning. Our work offers significant contributions, summarized as follows: We introduce an innovative Strengthened External Attention (NEA) module capable of simultaneous multi-scale feature focus and comprehensive global context assimilation. This enriches the model’s comprehension of both overarching structure and semantics. Additionally, we propose a fresh balanced loss function to tackle the issue of imbalanced positive and negative samples within labels. Notably, this function employs sample labels as input, utilizing the mean value of all sample labels to quantify the frequency gap between positive and negative samples. This approach, combined with a customized balance weight, effectively addresses the challenge of label imbalance. Concurrently, we enhance the quantization loss function, intensifying its penalty for instances where the model’s binary code output surpasses ±1. This reinforcement results in a more robust and stable hash code output. The proposed method is assessed on prominent datasets, including CIFAR-10, NUS-WIDE, and ImageNet. Experimental outcomes reveal superior retrieval accuracy compared to current state-of-the-art techniques. Notably, the VTDH model achieves an exceptional mean average precision (mAP) of 97.3% on the CIFAR-10 dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.