Ranking on Very Large Knowledge Graphs

Abdelmoneim Amer Desouki,Michael Röder,Axel-Cyrille Ngonga Ngomo

doi:10.1145/3342220.3343660

Abstract

Ranking plays a central role in a large number of applications driven by RDF knowledge graphs. Over the last years, many popular RDF knowledge graphs have grown so large that rankings for the facts they contain cannot be computed directly using the currently common 64-bit platforms. In this paper, we tackle two problems: Computing ranks on such large knowledge bases efficiently and incrementally. First, we present ðare, a distributed approach for computing ranks on very large knowledge graphs. ðare assumes the random surfer model and relies on data partitioning to compute matrix multiplications and transpositions on disk for matrices of arbitrary size. Moreover, the data partitioning underlying ðare allows the execution of most of its steps in parallel. As very large knowledge graphs are often updated periodically, we tackle the incremental computation of ranks on large knowledge bases as a second problem. We address this problem by presenting \ihare, an approximation technique for calculating the overall ranking scores of a knowledge without the need to recalculate the ranking from scratch at each new revision. We evaluate our approaches by calculating ranks on the $3 \times 10^9$ and $2.4 \times 10^9$ triples from Wikidata resp. LinkedGeoData. Our evaluation demonstrates that ðare is the first holistic approach for computing ranks on very large RDF knowledge graphs. In addition, our incremental approach achieves a root mean squared error of less than $10^-7 $ in the best case. Both ðare and \ihare are open-source and are available at: \urlhttps://github.com/dice-group/incrementalHARE.

Full Text