Improving the Run-Time of Space-Efficient n-Gram Data Structures Using Apache Spark.

Fotios Kounelis,Phivos Mylonas,Andreas Kanavos

doi:10.1007/978-3-030-78775-2_19

Abstract

Storing information in memory efficiently is one of the most significant challenges in computer science. The two main factors that consist an efficient data structure is the reduction of space and time consumption. There is a plethora of different tools able to reduce the run-time of a process, and Apache Spark is one of these; it is a computing framework that is using clusters to execute a process. There are two key features in this software, a directed acyclic graph (DAG) that maps the execution process and the resilient distributed datasets (RDD), which allow large in-memory computations. In order to construct a data structure, which is space- and time-efficient, we have to utilize the corresponding framework. A comparison of the run-time improvement with the use of Spark is also provided. Finally, to prove the efficacy of this software tool, we construct a space-efficient data structure and compare the run-time with and without its use.

Full Text