Hashing has been widely used for large-scale remote sensing image retrieval due to its outstanding advantages in storage and search speed. Recently, deep hashing methods, which produce discriminative hash codes by building end-to-end deep convolutional networks, have shown promising results. However, training these networks requires numerous labeled images, which are scarce and expensive in remote sensing datasets. In order to solve this problem, we propose a deep unsupervised hashing method, namely deep contrastive self-supervised hashing (DCSH), which uses only unlabeled images to learn accurate hash codes. It eliminates the need for label annotation by maximizing the consistency of different views generated from the same image. More specifically, we assume that the hash codes generated from different views of the same image are similar, and those generated from different images are dissimilar. On the basis of the hypothesis, we can develop a novel loss function containing the temperature-scaled cross-entropy loss and the quantization loss to train the developed deep network end-to-end, resulting in hash codes with semantic similarity preserved. Our proposed network contains four parts. First, each image is transformed into two different views using data augmentation. After that, they are fed into an encoder with the same shared parameters to obtain deep discriminate features. Following this, a hash layer converts the high-dimensional image representations into compact binary codes. Lastly, a novel hash function is introduced to train the proposed network end-to-end and thus guide generated hash codes with semantic similarity. Extensive experiments on two popular benchmark datasets of the UC Merced Land Use Database and the Aerial Image Dataset have demonstrated that our DCSH has significant superiority in remote sensing image retrieval compared with state-of-the-art unsupervised hashing methods.
Read full abstract