Learning with non-negative matrix factorization (NMF) has significantly benefited large numbers of fields such as information retrieval, computer vision, natural language processing, biomedicine, and neuroscience, etc. However, little research (with NMF) has scratched hashing, which is a sharp sword in approximately nearest neighbors search for economical storage and efficient hardware-level XOR operations. To explore more, we propose a novel hashing model, called Regularized Semi-NMF for Hashing (SeH), which is a minimal optimization between Semi-NMF, semantics preserving, and efficient coding. Tricks such as balance codes, binary-like relaxation, and stochastic learning are employed to yield efficient algorithms which raise the capabilities to deal with a large-scale dataset. SeH is shown to evidently improve retrieval effectiveness over some state-of-the-art baselines on several public datasets (MSRA-CFW, Caltech256, Cifar10, and ImageNet) with different sample scales and feature representations. Furthermore, a case study on Caltech256, that is, three image queries are randomly selected and the corresponding search results are presented, would intuitively exhibit which method is better.
Read full abstract