Spatial-aware data partition for distributed memory parallelization of ANN search in multimedia retrieval

Guilherme Andrade,Renato Ferreira,George Teodoro

doi:10.1016/j.parco.2022.102992

Guilherme Andrade, Renato Ferreira + Show 1 more

https://doi.org/10.1016/j.parco.2022.102992

Copy DOI

Abstract

Content-based multimedia retrieval (CBMR) applications are becoming very popular in several online services which handles large volumes of data and are submitted to high query rates. While these applications may be complex, finding the nearest neighboring objects (multimedia descriptors) is typically their most time consuming operation. In order to address this problem, several recent works have proposed distributed memory parallelization of approximate nearest neighbors (ANN) search. These solutions employ a variety of ANN algorithms and different parallelization strategies. In this paper, we have identified the currently used parallelization strategies (Data Equal Split (DES) and Bucket Equal Split (BES)) and systematically evaluated their performance. We have also developed a framework to simplify the deployment of ANN algorithms in distributed memory machines with customized parallelization or data partition strategies. We further proposed a novel class of data partition/parallelization strategies that takes into account the data spatial proximity. Our approaches (SABES and SABES++) improves data locality and the system efficiency as compared to DES and BES. For instance, SABES++ achieved speedups of 4.2× and 1.8× on top of DES and BES, respectively, in the baseline case (40 nodes). Further, SABES and SABES++ also attained higher multi-node scalability and the gains vs DES and BES increase a larger number of nodes. SABES++ is 14.5× faster than DES when 160 nodes are used.

Full Text