Abstract

It is challenging to build a real-time information retrieval system, especially for systems with high-dimensional big data. To structure big data, many hashing algorithms that map similar data items to the same bucket to advance the search have been proposed. Locality-Sensitive Hashing (LSH) is a common approach for reducing the number of dimensions of a data set, by using a family of hash functions and a hash table. The LSH hash table is an additional component that supports the indexing of hash values (keys) for the corresponding data/items. We previously proposed the Dynamic Locality-Sensitive Hashing (DLSH) algorithm with a dynamically structured hash table, optimized for storage in the main memory and General-Purpose computation on Graphics Processing Units (GPGPU) memory. This supports the handling of constantly updated data sets, such as songs, images, or text databases. The DLSH algorithm works effectively with data sets that are updated with high frequency and is compatible with parallel processing. However, the use of a single GPGPU device for processing big data is inadequate, due to the small memory capacity of GPGPU devices. When using multiple GPGPU devices for searching, we need an effective search algorithm to balance the jobs. In this paper, we propose an extension of DLSH for big data sets using multiple GPGPUs, in order to increase the capacity and performance of the information retrieval system. Different search strategies on multiple DLSH clusters are also proposed to adapt our parallelized system. With significant results in terms of performance and accuracy, we show that DLSH can be applied to real-life dynamic database systems.

Highlights

  • With the development of digital content, the typical volume of a database has been growing increasingly larger

  • We propose an extension of Dynamic Locality-Sensitive Hashing (DLSH) for big data sets using multiple General-Purpose computation on Graphics Processing Units (GPGPU), in order to increase the capacity and performance of the information retrieval system

  • There have been several studies using hashing on GPGPU for handing the k-Nearest Neighbors (kNN) search problem [17], our study provides a new approach for the optimization of a parallel Locality-Sensitive Hashing (LSH) algorithm using multiple GPGPUs

Read more

Summary

Introduction

With the development of digital content, the typical volume of a database has been growing increasingly larger. Many high-dimensional data sets must be constantly updated, such as audio fingerprint, photo, and text data sets. Managing these data sets requires a suitable dynamic structure [1]. A variety of hashing algorithms have been proposed for high-dimensional data, such as data clustering, dimensionality reduction, hashing, and data classification algorithms, in order to increase the search speed of the Nearest Neighbor Search (NNS) [2,3].

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call