Abstract

Big data is being generated from various sources such as Internet of Things (IoT) and social media. Big data cannot be processed by traditional tools and technologies due to their properties, volume, velocity, veracity, and variety. Graphs are becoming increasingly popular to model real-world problems; the problems are typically large and, hence, give rise to large graphs, which could be analysed and solved using big data technologies. This paper explores the performance of single source shortest path graph computations using the Apache Spark big data platform. We use the United States road network data, modelled as graphs, and calculate shortest paths between vertices. The experiments are performed on the Aziz supercomputer (a Top500 machine). We solve problems of varying graph sizes, i.e. various states of the US, and analyse Spark’s parallelization behavior. As expected, the speedup is dependent on both the size of the data and the number of parallel nodes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.