Resource provisioning for memory intensive graph processing

Agron Cela,Seok-Hee Hong,Young Choon Lee

doi:10.1145/3167918.3167948

Abstract

In the recent past, graph processing has attracted much attention particularly with the development of Google's Pregel. What has followed is the development of open source counterparts, Apache Giraph and GraphLab. These systems enable the distributed processing of large and complex graphs, such as web graphs and social networks. However, the efficacy of such distributed processing heavily depends on resource provisioning even in clouds with increasingly abundant resources. In this paper, we present resource provisioning models for memory-intensive graph processing applications. In particular, we profile their memory usage pattern while considering their types and sizes. This profiling model enables to determine the right number of resources and workers (or containers in a graph processing framework). As such determination on resource provisioning level is subject to user's objective, we further provide a model to identify Pareto frontier of resource provisioning trade-offs between performance and cost. We use a graph drawing application (GILA [4]), implemented on Apache Giraph and Hadoop YARN, as a case study. Experimental results demonstrate an increase in performance by 15% - 35% with a cost trade-off through the optimization of worker count and the use of Pareto Optimal resources selection.

Full Text