Abstract

Link prediction is the task of computing the likelihood that a link exists between two given nodes in a network. With countless applications in different areas of science and engineering, link prediction has received the attention of many researchers working in various disciplines. Considerable research efforts have been invested into the development of increasingly accurate prediction methods. Most of the proposed algorithms, however, have limited use in practice because of their high computational requirements. The aim of this work is to develop a scalable link prediction algorithm that offers a higher overall predictive power than existing methods. The proposed solution falls into the class of global, parameter-free similarity-popularity-based methods, and in it, we assume that network topology is governed by three factors: popularity of the nodes, their similarity and the attraction induced by local neighbourhood. In our approach, popularity and neighbourhood-caused attraction are computed directly from the network topology and factored out by introducing a specific weight map, which is then used to estimate the dissimilarity between non-adjacent nodes through shortest path distances. We show through extensive experimental testing that the proposed method produces highly accurate predictions at a fraction of the computational cost required by existing global methods and at a low additional cost compared to local methods. The scalability of the proposed algorithm is demonstrated on several large networks having hundreds of thousands of nodes.

Highlights

  • Background and Related WorkThe data for a link prediction problem consists in a network G(V, E), where V is the set of nodes and E is the set of edges

  • In2, the authors introduced Stochastic Block Model (SBM) in which the nodes are partitioned into groups, and the probability of existence of a link between two nodes depends on the groups to which they belong

  • Subsequent ones, we report the results of a representative sample of these methods consisting of: Adamic-Adar index (ADA), common neighbours (CNE), Cannistraci-Hebb model (CH), hub promoted index (HPI), Jaccard index (JID), preferential attachment (PAT) index, and resource allocation index (RAL)

Read more

Summary

Introduction

The data for a link prediction problem consists in a network G(V, E), where V is the set of nodes and E is the set of edges. E ⊆ U, and if n is the number of nodes in the graph (that is, n = V ), the setU contains exactly n(n − 1)/2 elements, which is the maximum number of undirected edges that can exist in the network. The link prediction problem consists in discovering which elements of U − E are missing from the network or may appear in the future[5]. This is typically achieved by assigning a score sij to every edge (i, j) to be predicted.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.