Abstract

Graph Processing has been widely used to capture complex data dependency and uncover relationship insights. Due to the ever-growing graph scale and algorithm complexity, distributed graph processing has become more and more popular. In this paper, we investigate how to balance performance and cost for large scale graph processing on configurable virtual machines (VMs). We analyze the system architecture and implementation details of a Pregel-like distributed graph processing framework and develop a system-aware model to predict the execution time. Consequently, cost effective execution scenarios are recommended by selecting a certain number of VMs with specified capability subject to the predefined resource price and user preference. Experiments using synthetic and real world graphs have verified that system-aware model can achieve much higher prediction accuracy than popular machine-learning models which treat graph processing framework as a black box. As a result, the recommended execution scenarios have comparable cost efficiency to the optimal scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call