Parallel Shortest Path Big Data Graph Computations of US Road Network Using Apache Spark: Survey, Architecture, and Evaluation

Yasir Arfat,Sugimiyanto Suma,Rashid Mehmood,Aiiad Albeshri

doi:10.1007/978-3-030-13705-2_8

Abstract

This chapter reports our continuing work on single source shortest path computations of big data road network graphs using Apache Spark. Smart applications and infrastructures are increasingly relying on graph computations to model real-life problems. Big data is being generated from various sources such as Internet of Things (IoT) and social media. Big data cannot be processed by traditional tools and technologies due to their properties, volume, velocity, veracity, and variety. The problems and relevant data are typically large and, hence, give rise to large graphs, which could be analyzed and solved using big data technologies. We use the US road network data, modelled as graphs, and calculate shortest paths between a set of large numbers of vertices in parallel. The experiments are performed on the Aziz supercomputer. We analyze Spark’s parallelization behavior by solving problems of varying graph sizes, i.e., various states of the USA (with over 58 million edges), and varying number of shortest path queries up to one million. We achieve good performance, and as expected, the speedup is dependent on both the size of the data and the number of parallel nodes. The system architecture for graph computing in Spark is explained. A detailed review of the relevant work is provided. We call our system, the Big Data Shortest Path Graph Computing (BDSPG) system.

Full Text