Solving Large Graph Problems in MapReduce-Like Frameworks via Optimized Parameter Configuration

Huanle Xu,Zhibo Yang,Ronghai Yang,Wing Cheong Lau

doi:10.1007/978-3-319-27122-4_36

Abstract

In this paper, we propose a scheme to solve large dense graph problems under the MapReduce framework. The graph data is organized in terms of blocks and all blocks are assigned to different map workers for parallel processing. Intermediate results of map workers are combined by one reduce worker for the next round of processing. This procedure is iterative and the graph size can be reduced substantially after each round. In the last round, a small graph is processed on one single map worker to produce the final result. Specifically, we present some basic algorithms like Minimum Spanning Tree, Finding Connected Components and Single-Source Shortest Path which can be implemented efficiently using this scheme. We also offer a mathematical formulation to determine the parameters under our scheme so as to achieve the optimal running-time performance. Note that the proposed scheme can be applied in MapReduce-like platforms such as Spark. We use our own cluster and Amazon EC2 as the testbeds to respectively evaluate the performance of the proposed Minimum Spanning Tree algorithm under the MapReduce and Spark frameworks. The experimental results match well with our theoretical analysis. Using this approach, many parallelizable problems can be solved in MapReduce-like frameworks efficiently.

Full Text