Abstract

Nowadays various application domains is dramatically increasing, the number of nodes may reach the scale of hundreds of millions or even more. Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a number of techniques exist for answering reach ability queries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding approach, which selects a set of graph nodes as landmarks and computes the shortest distances from each landmark to all nodes as an embedding. To answer a shortest distance query, the pre computed distances from the landmarks to the two query nodes are used to compute an approximate shortest distance based on the triangle inequality. In this paper, I analyze the factors that affect the accuracy of distance estimation in landmark embedding. In particular, find that a globally selected, query in dependent landmark set may introduce a large relative error, especially for nearby query nodes. To address this issue, I propose a query-dependent local landmark scheme, which identifies a local landmark close to both query nodes and provides more accurate distance estimation than the traditional global landmark approach. We propose efficient local landmark indexing and retrieval techniques with a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves to achieve low offline indexing complexity and online query complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call