Among the existing hashing methods, the Self-taught hashing (STH) is regarded as the state-of-the-art work. However, it still suffers the problem of semantic loss, which mainly comes from the fact that the original optimization objective of in-sample data is NP-hard and therefore is compromised into the combination of Laplacian Eigenmaps (LE) and binarization. Obviously, the shape associated with the embedding of LE is quite dissimilar to that of binary code. As a result, binarization of the LE embedding readily leads to significant semantic loss. To overcome this drawback, we combine the constrained nonnegative sparse coding and the Support Vector Machine (SVM) to propose a new hashing method, called nonnegative sparse coding induced hashing (NSCIH). Here, nonnegative sparse coding is exploited for seeking a better intermediate representation, which can make sure that the binarization can be smoothly conducted. In addition, we build an image copy detection scheme based on the proposed hashing methods. The extensive experiments show that the NSCIH is superior to the state-of-the-art hashing methods. At the same time, this copy detection scheme can be used for performing copy detection over very large image database.
Read full abstract