Abstract

This chapter reports our continuing work on single source shortest path computations of big data road network graphs using Apache Spark. Smart applications and infrastructures are increasingly relying on graph computations to model real-life problems. Big data is being generated from various sources such as Internet of Things (IoT) and social media. Big data cannot be processed by traditional tools and technologies due to their properties, volume, velocity, veracity, and variety. The problems and relevant data are typically large and, hence, give rise to large graphs, which could be analyzed and solved using big data technologies. We use the US road network data, modelled as graphs, and calculate shortest paths between a set of large numbers of vertices in parallel. The experiments are performed on the Aziz supercomputer. We analyze Spark’s parallelization behavior by solving problems of varying graph sizes, i.e., various states of the USA (with over 58 million edges), and varying number of shortest path queries up to one million. We achieve good performance, and as expected, the speedup is dependent on both the size of the data and the number of parallel nodes. The system architecture for graph computing in Spark is explained. A detailed review of the relevant work is provided. We call our system, the Big Data Shortest Path Graph Computing (BDSPG) system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call