Abstract

Recently, large-scale graph data processing and mining has drawn great attention, and many distributed graph processing systems have been proposed. However, large-scale graph processing remains a challenging problem. Because the computation time in some cases is still unacceptable especially when the time is limited. As illustrated in Table 1, nearly three hours are needed when running Single-Source Shortest Path algorithm on the USA-road dataset using performant open-source distributed graph processing systems. In this paper, we propose an effective priority-based message sampling (PMS ) approach to further improve the performance of distributed graph processing at the cost of some accuracy loss. Noticing that the passing and processing of messages dominates the computation time, our approach works by eliminating those less useful messages directly without passing them which can effectively reduce the computation overhead. We implement our approach basing on Apache Giraph, a popular open-source implementation of Google's Pregel and report the primary results of our prototype system. The experimental results show that our approach can achieve reasonable accuracy with much less computation time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call