A parallel content-based image retrieval system using spark and tachyon frameworks

Saliha Mezzoudj,Ali Behloul,Rachid Seghir,Yassmina Saadna

doi:10.1016/j.jksuci.2019.01.003

Saliha Mezzoudj, Ali Behloul + Show 2 more

Open Access

https://doi.org/10.1016/j.jksuci.2019.01.003

Copy DOI

Abstract

With the huge increase of large-scale multimedia over Internet, especially images, building Content-Based Image Retrieval (CBIR) systems for large-scale images has become a big challenge. One of the drawbacks associated with CBIR is the very long execution time. In this article, we propose a fast Content-Based Image Retrieval system using Spark (CBIR-S) targeting large-scale images. Our system is composed of two steps. (i) image indexation step, in which we use MapReduce distributed model on Spark in order to speed up the indexation process. We also use a memory-centric distributed storage system, called Tachyon, to enhance the write operation (ii) image retrieving step which we speed up by using a parallel k-Nearest Neighbors (k-NN) search method based on MapReduce model implemented under Apache Spark, in addition to exploiting the cache method of spark framework. We have showed, through a wide set of experiments, the effectiveness of our approach in terms of processing time.

Full Text