A Ranking-Based Hashing Algorithm Based on the Distributed Spark Platform

Anbang Yang,Huahui Chen,Yihong Dong,Jiangbo Qian

doi:10.3390/info11030148

Abstract

With the rapid development of modern society, generated data has increased exponentially. Finding required data from this huge data pool is an urgent problem that needs to be solved. Hashing technology is widely used in similarity searches of large-scale data. Among them, the ranking-based hashing algorithm has been widely studied due to its accuracy and speed regarding the search results. At present, most ranking-based hashing algorithms construct loss functions by comparing the rank consistency of data in Euclidean and Hamming spaces. However, most of them have high time complexity and long training times, meaning they cannot meet requirements. In order to solve these problems, this paper introduces a distributed Spark framework and implements the ranking-based hashing algorithm in a parallel environment on multiple machines. The experimental results show that the Spark-RLSH (Ranking Listwise Supervision Hashing) can greatly reduce the training time and improve the training efficiency compared with other ranking-based hashing algorithms.

Highlights

With the continuous development of computing technology and digital media technology in recent years, data generation is increasing every day
The other method is a hashing-based search method. This method is divided into two categories, with one being the data-independent method and the other being locality-sensitive hashing (LSH) [5,6]
Overall of the Algorithm the existingDescription training requirements, while the distributed Spark platform can execute the algorithm flow At in parallel and shorten the training this paper proposes a ranking-based present, the complexity of most time

Summary

Introduction

With the continuous development of computing technology and digital media technology in recent years, data generation is increasing every day. This data exists in many forms, including text, images, audio, video, and other forms. The other method is a hashing-based search method This method is divided into two categories, with one being the data-independent method and the other being locality-sensitive hashing (LSH) [5,6]. Some hashing algorithms are currently taking too long to meet the search requirements of the current big data environment

Objectives

Methods

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Mar 9, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Ranking-Based Hashing Algorithm Based on the Distributed Spark Platform

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

On the thinnest coverings of spheres and ellipsoids with balls in Hamming and Euclidean spaces
I Dumer ... V.V Prelov
Electronic Notes in Discrete Mathematics | VOL. 21
I Dumer, et. al.I Dumer ... V.V Prelov
01 Aug 2005
Electronic Notes in Discrete Mathematics | VOL. 21

LBMCH
Yang Wang ... Wenjie Zhang
-
Yang Wang, et. al.Yang Wang ... Wenjie Zhang
09 Aug 2015
09 Aug 2015

Distributed architecture design of big data platform
Yiwen Li
-
Yiwen LiYiwen Li
14 Apr 2022
14 Apr 2022

Hierarchical Clustering of Shotgun Proteomics Data
Ville R Koskinen ... John S Cottrell
Molecular & Cellular Proteomics | VOL. 10
Ville R Koskinen, et. al.Ville R Koskinen ... John S Cottrell
29 Mar 2011
Molecular & Cellular Proteomics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Ranking-Based Hashing Algorithm Based on the Distributed Spark Platform

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information