Locality-Sensitive Hashing for Information Retrieval System on Multiple GPGPU Devices

Toan Nguyen Mau,Yasushi Inoguchi

doi:10.3390/app10072539

Toan Nguyen Mau, Yasushi Inoguchi

Open Access

https://doi.org/10.3390/app10072539

Copy DOI

Abstract

It is challenging to build a real-time information retrieval system, especially for systems with high-dimensional big data. To structure big data, many hashing algorithms that map similar data items to the same bucket to advance the search have been proposed. Locality-Sensitive Hashing (LSH) is a common approach for reducing the number of dimensions of a data set, by using a family of hash functions and a hash table. The LSH hash table is an additional component that supports the indexing of hash values (keys) for the corresponding data/items. We previously proposed the Dynamic Locality-Sensitive Hashing (DLSH) algorithm with a dynamically structured hash table, optimized for storage in the main memory and General-Purpose computation on Graphics Processing Units (GPGPU) memory. This supports the handling of constantly updated data sets, such as songs, images, or text databases. The DLSH algorithm works effectively with data sets that are updated with high frequency and is compatible with parallel processing. However, the use of a single GPGPU device for processing big data is inadequate, due to the small memory capacity of GPGPU devices. When using multiple GPGPU devices for searching, we need an effective search algorithm to balance the jobs. In this paper, we propose an extension of DLSH for big data sets using multiple GPGPUs, in order to increase the capacity and performance of the information retrieval system. Different search strategies on multiple DLSH clusters are also proposed to adapt our parallelized system. With significant results in terms of performance and accuracy, we show that DLSH can be applied to real-life dynamic database systems.

Highlights

With the development of digital content, the typical volume of a database has been growing increasingly larger
We propose an extension of Dynamic Locality-Sensitive Hashing (DLSH) for big data sets using multiple General-Purpose computation on Graphics Processing Units (GPGPU), in order to increase the capacity and performance of the information retrieval system
There have been several studies using hashing on GPGPU for handing the k-Nearest Neighbors (kNN) search problem [17], our study provides a new approach for the optimization of a parallel Locality-Sensitive Hashing (LSH) algorithm using multiple GPGPUs

Summary

Introduction

With the development of digital content, the typical volume of a database has been growing increasingly larger. Many high-dimensional data sets must be constantly updated, such as audio fingerprint, photo, and text data sets. Managing these data sets requires a suitable dynamic structure [1]. A variety of hashing algorithms have been proposed for high-dimensional data, such as data clustering, dimensionality reduction, hashing, and data classification algorithms, in order to increase the search speed of the Nearest Neighbor Search (NNS) [2,3].

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied sciences	Publication Date: Apr 7, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Locality-Sensitive Hashing for Information Retrieval System on Multiple GPGPU Devices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

Enabling CUDA acceleration within virtual machines using rCUDA
Jose Duato ... Juan C Fernandez
-
Jose Duato, et. al.Jose Duato ... Juan C Fernandez
01 Dec 2011
01 Dec 2011

Emerging technology about GPGPU
Enhua Wu ... Youquan Liu
-
Enhua Wu, et. al. Enhua Wu ... Youquan Liu
01 Nov 2008
01 Nov 2008

Performance-Portable Distributed k-Nearest Neighbors using Locality-Sensitive Hashing and SYCL
Marcel Breyer ... Gregor Daiß
-
Marcel Breyer, et. al.Marcel Breyer ... Gregor Daiß
27 Apr 2021
27 Apr 2021

A Hash Table Without Hash Functions, and How to Get the Most Out of Your Random Bits
William Kuszmaul
-
William KuszmaulWilliam Kuszmaul
01 Oct 2022
01 Oct 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Locality-Sensitive Hashing for Information Retrieval System on Multiple GPGPU Devices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences