Abstract

Similarity detection technology captures a host of researchers’ attention. Minwise hashing schemes become the current researching hot spots in machine learning for similarity preservation. During the data preprocessing stage, the basic idea of minwise hashing schemes is to transfer the original data into binary codes which are good proxies of original data to preserve the similarity. Minwise hashing schemes can improve the computation efficiency and save the storage space without notable loss of accuracy. Thus, they have been studied extensively and developed rapidly for decades. Considering minwise hashing algorithm and its variants, a systematic survey is needed and beneficial to understand and utilize this kind of data preprocessing techniques more easily. The purpose of this paper is to review minwise hashing algorithms in detail and provide an insightful understanding of current developments. In order to show the application prospect of the minwise hashing algorithms, various algorithms have combined with linear Support Vector Machine for large-scale classification. Both theoretical analysis and experimental results demonstrate that these algorithms can achieve massive advantages in accuracy, efficiency and energy-consumption. Furthermore, their limitations, major opportunities and challenges, extensions and variants as well as potential important research directions have been pointed out.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.