NScaleSpark

Abdul Quamar,Amol Deshpande

doi:10.1145/2980523.2980529

Abstract

In this paper, we describe NScaleSpark, a framework for executing large-scale distributed graph analysis tasks on the Apache Spark platform. NScaleSpark is motivated by the increasing interest in executing rich and complex analysis tasks over large graph datasets. There is much recent work on vertex-centric graph programming frameworks for executing such analysis tasks -- these systems espouse a think-like-a-vertex (TLV) paradigm, with some example systems being Pregel, Apache Giraph, GPS, Grace, and GraphX (built on top of Apache Spark). However, the TLV paradigm is not suitable for many complex graph analysis tasks that typically require processing of information aggregated over neighborhoods or subgraphs in the underlying graph. Instead, NScaleSpark is based on a think-like-a-subgraph paradigm (also recently called think-like-an-embedding [23]). Here, the users specify computations to be executed against a large number of multi-hop neighborhoods or subgraphs of the data graph. NScaleSpark builds upon our prior work on the NScale system [18], which was built on top of the Hadoop MapReduce system. We describe how we reimplemented NScale on the Apache Spark platform, the key challenges therein, and the design decisions we made. NScaleSpark uses a series of RDD transformations to extract and hold the relevant subgraphs in distributed memory with minimal footprint using a cost-based optimizer. Our in-memory graph data structure enables efficient graph computations over large-scale graphs. Our experimental results over several real world data sets and applications show orders-of-magnitude improvement in performance and total cost over GraphX and other vertex-centric approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

NScaleSpark

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

NScale
Abdul Quamar ... Amol Deshpande
Proceedings of the VLDB Endowment | VOL. 7
Abdul Quamar, et. al.Abdul Quamar ... Amol Deshpande
01 Aug 2014
Proceedings of the VLDB Endowment | VOL. 7

Using Data Stream Management Systems for Traffic Analysis – A Case Study –
Thomas Plagemann ... Ernst W Biersack
-
Thomas Plagemann, et. al.Thomas Plagemann ... Ernst W Biersack
01 Jan 2004
01 Jan 2004

Gradation and Map Analysis in Area-Class Maps
Barry J Kronenfeld
-
Barry J KronenfeldBarry J Kronenfeld
01 Jan 2004
01 Jan 2004

GLog: A high level graph analysis system using MapReduce
Jun Gao ... Jiashuai Zhou
-
Jun Gao, et. al.Jun Gao ... Jiashuai Zhou
01 Mar 2014
01 Mar 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

NScaleSpark

Abstract

Talk to us

Similar Papers