Abstract

With the rapid progress of earth observation technology, cross-modal remote sensing (RS) image-sound retrieval has attracted much attention from the field of RS data processing. Existing approaches usually learn the pairwise similarity relations between RS images and sounds. However, these approaches ignore relative semantic similarity relationships, which leads to poor performance of cross-modal RS image-sound retrieval. In this article, we address this dilemma with a novel <i>deep quadruple-based hashing</i> (DQH) approach. We first devise a novel quadruple-based hashing network to learn relative semantic similarity relationships of hash codes. Meanwhile, we propose a quadruple construction hard module, which randomly selects two triplet hard units to directly learn relative semantic similarity relationships. On top of the two paths, we develop a new objective function to perform effective hash codes learning. The new objective function not only captures the relative semantic correlation of hash codes across different modalities and learns the relative semantic correlation of deep features but also enhances category-level semantics of hash codes and reduces the quantization error between hash-like codes and hash codes. The reasonableness and effectiveness of the proposed architecture are well illustrated by comprehensive experiments on diverse RS image-sound datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call