In-Memory Stream Indexing of Massive and Fast Incoming Multimedia Content

Stefanos Antaris,Dimitrios Rafailidis

doi:10.1109/tbdata.2017.2697441

Abstract

In this article, a media storm indexing mechanism is presented, where media storms are defined as fast incoming batches. We propose an approximate media storm indexing mechanism to index/store massive image collections with varying incoming image rate. To evaluate the proposed indexing mechanism, two architectures are used: i) a baseline architecture, which utilizes a disk-based processing strategy and ii) an in-memory architecture, which uses the Flink distributed stream processing framework. This study is the first in the literature to utilize an in-memory processing strategy to provide a media storm indexing mechanism. In the experimental evaluation conducted on two image datasets, among the largest publicly available with 80 M and 1 B images, a media storm generator is implemented to evaluate the proposed media storm indexing mechanism on different indexing workloads, that is, images that come with high volume and different velocity at the scale of $10^5$ and $10^6$ incoming images per second. Using the approximate media storm indexing mechanism a significant speedup factor, equal to 26.32 on average, is achieved compared with conventional indexing techniques, while maintaining high search accuracy, after having indexed the media storms. Finally, the implementations of both architectures and media storm indexing mechanisms are made publicly available.

Full Text