Abstract

Several hashing-based methods for Approximate Nearest Neighbors (ANN) search in a large data set have been proposed recently. In particular, semi-supervised hashing utilizes semantic similarity given for a small fraction of pairwise data samples and active hashing aims to improve the performance for ANN search by relying on an expert for the labeling of the most informative points. In this study, we present an active hashing method by prototype-based sample selection. Knowing semantic similarities between cluster prototypes can help extracting relations among the points in the corresponding clusters. For expert labeling, we select prototypes from clusters which do not contain any data points with labeled information so that all areas can be covered effectively. Experimental results demonstrate that the proposed active hashing method improves the performance for ANN search.

Highlights

  • As a huge size of data collection becomes easier to obtain, efficient methods for nearest neighbors search are needed in various areas such as data mining and pattern recognition (Shakhnarovich et al, 2006)

  • Unlike data-independent hashing in Locality Sensitive Hashing (LSH), several data dependent hashing methods including Spectral Hashing (SH) (Weiss et al, 2008) and Binary Reconstructive Embedding (BRE) (Kulis et al, 2009) learn hash functions from training data so that similar data points in the original space are mapped to near points in the binary embedding space

  • We present an active hashing method by prototype-based sample selection

Read more

Summary

Introduction

As a huge size of data collection becomes easier to obtain, efficient methods for nearest neighbors search are needed in various areas such as data mining and pattern recognition (Shakhnarovich et al, 2006). In hashing-based methods, by mapping data points to k-bit binary codes, nearest neighbors are searched in a binary embedding space. Semi-supervised hashing utilizes semantic similarity which is given in terms of two categories of relations for a fraction of pairwise data samples: Must-link and cannot-link (Wang et al, 2012; Mu et al, 2010). We present an active hashing method by prototype-based sample selection. Assuming clusters and their cluster prototypes are found, it is well known that prototypes can be used to find nearest neighbors efficiently (Tan et al, 2014).

Related Works
Experimental Results for Active Hashing
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.