Abstract

We focus on the vertex-centric (VC) model introduced in Pregel, a Google system for distributed graph processing. In particular, we consider two popular implementations of the VC model: Apache Giraph and GraphChi. The first is a VC system for cluster computing, while the second is a VC system for a single PC. Apache Giraph became very popular after careful engineering by Facebook researchers in 2012 to scale the computation of PageRank to a trillion-edge graph of user interactions using 200 machines. On the other hand, GraphChi became popular, around the same time in 2012, as it made possible to perform intensive graph computations in a single PC, in just under 59 minutes, whereas the distributed systems were taking 400 minutes using a cluster of about 1,000 computers (as reported also by MIT Technology Review). Since then, new versions of Apache Giraph and GraphChi have been released, where new ideas and optimizations have been implemented. Therefore, it is time to validate again the claims made four years ago. In this work, we embark in this validation. We consider three cornerstone graph problems: computing PageRank, shortest-paths, and weakly-connected-components. Based on current experiments, we conclude that in the present, even for a moderate number of simple machines, Apache Giraph outperforms GraphChi for all the algorithms and datasets tested. This is in contrast to the claims of the GraphChi authors in 2012.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call