Abstract

Given a large graph, such as a social network or a knowledge graph, a fundamental query is how to find the distance from a source vertex to another vertex in the graph. As real graphs become very large and many distributed graph systems, such as Pregel, Pregel+, Giraph, and GraphX, are proposed, how to employ distributed graph systems to process single-source distance queries should attract more attention. In this paper, we propose a landmark-based framework to optimize the distance computation over distributed graph systems. We also use a measure called set betweenness to select the optimal set of landmarks for distance computation. Although we can prove that selecting the optimal set of landmarks is NP-hard, we propose a heuristic distributed algorithm that can guarantee the approximation ratio. Experiments on large real graphs confirm the superiority of our methods.

Highlights

  • With the rapid development of the Internet and social networks in recent years, large-scale data in graph models have gradually increased

  • To improve the scalability of the SSSP length query evaluation, we propose a landmark-based framework over distributed graph systems for computing the SSSP length query in large graphs in this paper

  • We propose a landmark-based framework to evaluate the SSSP length query over distributed graph systems, which utilizes the characteristics of the distributed graph systems to improve the efficiency and scalability of our method

Read more

Summary

INTRODUCTION

With the rapid development of the Internet and social networks in recent years, large-scale data in graph models have gradually increased. When graph models are used in an increasing number of applications, as one of the most classic problems in a graph, single-source shortest path length (SSSP length) queries have been studied for more than half a century and have received increasing attention [6], [16], [26]. To improve the scalability of the SSSP length query evaluation, we propose a landmark-based framework over distributed graph systems for computing the SSSP length query in large graphs in this paper. We take advantage of the calculated shortest path trees of the landmarks to compute the distances from the source vertex of the SSSP length query to other vertices over distributed graph systems. We prove that the problem of selecting the optimal set of landmarks is NP-hard, we propose a heuristic distributed strategy to guarantee the approximation ratio. we conduct extensive experiments over different kinds of real graphs on multiple distributed graph systems (Pregel+ [25], Giraph [21] and GraphX [9]) to verify the performance of the proposed techniques

GRAPH AND PATH
LANDMARK
COMPUTATIONAL MODEL OF A DISTRIBUTED GRAPH SYSTEM
OUR SOLUTION
ANALYSIS
EXPERIMENTS
COMPARISON WITH DIFFERENT LANDMARK SELECTION METHODS
EFFECT OF LANDMARKS’ NUMBER
LANDMARK-BASED DISTANCE COMPUTATION
DISTRIBUTED GRAPH PROCESSING SYSTEMS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.