Single-loss hash image retrieval method based on improved visual transformer

Huajie Pei,Zhijie Wang

doi:10.54254/2755-2721/43/20230850

Abstract

Deep hashing methods have gained popularity in image retrieval due to their advantages such as low storage requirements and high efficiency. However, existing deep hashing methods for large-scale image retrieval tasks suffer from issues including low discriminative power of binary hash codes, difficult optimization of losses, and low retrieval accuracy. This paper proposes a single-loss hash image retrieval method based on an improved visual transformer to address these issues. The proposed method utilizes a pre-trained Vision Transformer (ViT) on ImageNet as the backbone network, augmented with a hash coding layer to extract image features more comprehensively. Additionally, we design a single learning objective loss function that addresses the discriminative power of hash codes and quantization errors, thereby eliminating the complexity of adjusting various loss weights. Experimental evaluations on ImageNet100, NUS-WIDE, CIFAR10, and MS-COCO datasets demonstrate the superior performance of the proposed method compared to contemporary methods, indicating its adaptability to diverse data.

Full Text