Attention-Aware Joint Location Constraint Hashing for Multi-Label Image Retrieval

Yingqi Zhang,Mingliang Zhou,Jiaxing Shang,Yong Feng,Baohua Qiang

doi:10.1109/access.2019.2962084

Abstract

Learning based hashing has been widely used in approximate nearest neighbor search for image retrieval. However, most of the existing hashing methods are designed to learn only simplex feature similarity while ignored the location similarity among multiple objects, thus cannot work well on multi-label image retrieval tasks. In this paper, we propose a novel supervised hashing method which fusions the two kinds of similarities together. First, we leverage an adjacency matrix to record the relative location relationship among multiple objects. Second, by incorporating matrix discretization difference and image label difference, we re-define the pairwise image similarity in a more meticulous way. Third, to learn more distinguishable hash codes, we leverage an attention sub-network to identify the approximate regions of the objects in an image so that the extracted features can mainly focus on the foreground objects and ignore the background clutter. The loss function in our method consists of a multi-categories classification loss which is used to learn the attention sub-network and a hash loss with a scaled sigmoid function which is used to learn the efficient hash codes. Experiment results show that our proposed method is effective in preserving high-level similarities and outperforms the baseline methods in multi-label image retrieval.

Highlights

Due to the ubiquity of social media, the amount of web images has witnessed a dramatic increase in the past decade
We propose a novel definition of pairwise similarity of multi-label images which incorporates the location relationship among multiple objects in the image
THE PROPOSED ATTENTION-AWARE JOINT LOCATION CONSTRAINT HASHING There are three parts in our proposed framework: (i) a novel similarity definition for multi-label image pairs, (ii) a deep feature extraction network with attention mechanism to extract attentive feature representation of an input image, (iii) a hash layer with a scaled sigmoid function which are more sensitive to the hamming distance between hash codes and transform the attentive feature representations into hash vectors depending on both label similarity and location relationship similarity

Summary

INTRODUCTION

Due to the ubiquity of social media, the amount of web images has witnessed a dramatic increase in the past decade. We propose a novel definition of pairwise similarity of multi-label images which incorporates the location relationship among multiple objects in the image. THE PROPOSED ATTENTION-AWARE JOINT LOCATION CONSTRAINT HASHING There are three parts in our proposed framework: (i) a novel similarity definition for multi-label image pairs, (ii) a deep feature extraction network with attention mechanism to extract attentive feature representation of an input image, (iii) a hash layer with a scaled sigmoid function which are more sensitive to the hamming distance between hash codes and transform the attentive feature representations into hash vectors depending on both label similarity and location relationship similarity. The closer sij gets to 1, the lower the hamming distance between hi and hj is and vice versa

DEEP FEATURE EXTRACTION NETWORK WITH ATTENTION MECHANISM

EVALUATION METRICS

COMPARISONS WITH THE STATE-OF-ART METHODS

Findings

CONCLUSION