Abstract

Deep hashing methods have gained popularity in image retrieval due to their advantages such as low storage requirements and high efficiency. However, existing deep hashing methods for large-scale image retrieval tasks suffer from issues including low discriminative power of binary hash codes, difficult optimization of losses, and low retrieval accuracy. This paper proposes a single-loss hash image retrieval method based on an improved visual transformer to address these issues. The proposed method utilizes a pre-trained Vision Transformer (ViT) on ImageNet as the backbone network, augmented with a hash coding layer to extract image features more comprehensively. Additionally, we design a single learning objective loss function that addresses the discriminative power of hash codes and quantization errors, thereby eliminating the complexity of adjusting various loss weights. Experimental evaluations on ImageNet100, NUS-WIDE, CIFAR10, and MS-COCO datasets demonstrate the superior performance of the proposed method compared to contemporary methods, indicating its adaptability to diverse data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call