With the explosive growth of multimedia contents, multimedia retrieval is facing unprecedented challenges on both storage cost and retrieval speed. Hashing technique can project the high-dimensional data into compact binary hash codes. With it, the most time-consuming semantic similarity computation during the multimedia retrieval process can be significantly accelerated with fast Hamming distance computation, and meanwhile the storage cost can be reduced greatly by the binary embedding. In the light of this, multi-modal hashing has recently received considerable attention to support large-scale multimedia retrieval. Different from uni-modal hashing, the multi-modal hashing focuses on modeling the multi-modal semantics and further preserving them into binary hash codes with hash learning. In this paper, we first systematically review the existing learning to hash methods for efficient multimedia retrieval, categorizing them according to the multimedia retrieval tasks, the specific multi-modal semantic modeling techniques, and hash learning strategies. Thereafter, we present the performance comparison results. We ultimately discuss the challenges and potential research directions that may require further investigation in multi-modal hash learning. To facilitate the research on multi-modal hashing, we develop an open-source performance comparison tool at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/BMC-SDNU/Hashing-Retrieval</uri> .
Read full abstract