An efficient parallel indexing structure for multi-dimensional big data using spark

Manar A. Elmeiligy,Ali I. El Desouky,Sally M. Elghamrawy

doi:10.1007/s11227-021-03718-3

Abstract

With the increasing daily production of data in recent years, indexing, storing and retrieving huge amounts of data have become a common problem, especially for multi-dimensional big data. Although R-tree has proved to be efficient for indexing multi-dimensional big data, the R-tree suffers from the curse of dimensionality problem. Many researchers continue to use the R-tree in their studies as it is the most famous tree-like structure for indexing multi-dimensional data. However, with increasing numbers of dimensions in multi-dimensional data the performance of R-Tree will decrease. This paper proposes a new indexing structure called Parallel Indexing System Structure based on Spark (ParISSS), which is an efficient system for indexing multi-dimensional big data, to overcome these problems. ParISSS introduces six types of computing nodes, the reception-node is used to insert and index data, the normal-node is used to store indexed data, the resolution-node is used to distribute a reception-index to a normal-node, the representative-node is used to receive queries from the user, and the reply-node and check-node are used to send the results to the user. We also introduced BR*-tree structure to improve the storing and searching processes. We present an extensive experimental evaluation of our system, comparing several indexing systems. The experimental results show that ParISSS outperforms other indexing systems.

Full Text