Abstract

With the emergence of numerous link prediction methods, how to accurately evaluate them and select the appropriate one has become a key problem that cannot be ignored. Since AUC was first used for link prediction evaluation in 2008, it is arguably the most preferred metric because it well balances the role of wins (the testing link has a higher score than the unobserved link) and the role of draws (they have the same score). However, in many cases, AUC does not show enough discrimination when evaluating link prediction methods, especially those based on local similarity. Hence, we propose a new metric, called W-index, which considers only the effect of wins rather than draws. Our extensive experiments on various networks show that the W-index makes the accuracy scores of link prediction methods more distinguishable, and it can not only widen the local gap of these methods but also enlarge their global distance. We further show the reliability of the W-index by ranking change analysis and correlation analysis. In particular, some community-based approaches, which have been deemed effective, do not show any advantages after our reevaluation. Our results suggest that the W-index is a promising metric for link prediction evaluation, capable of offering convincing discrimination.

Highlights

  • Link prediction is one of the most fundamental problems of complex networks, which aims to infer the network link formation process by predicting missed or future relationships based on currently observed links [1]

  • We discuss two side effects of draws in the AUC and propose the W-index, which only cares about wins, to obtain discriminative evaluation of link prediction methods

  • A series of tools have been introduced for measuring the reliability and the performance of this new metric in this paper

Read more

Summary

Introduction

Link prediction is one of the most fundamental problems of complex networks, which aims to infer the network link formation process by predicting missed or future relationships based on currently observed links [1]. Various methods have been proposed for link prediction [12,13,14,15,16], most of which can be classified with heuristic-based approaches and learning-based approaches [17]. There are many long-standing challenges in the evaluation of link prediction methods. Many quantitative evaluation metrics used in link prediction are adopted from binary classification tasks [18]. As a typical fixed-threshold metric, the precision [21] is used commonly in link prediction literature studies. Clauset et al [22] presented that using the precision to evaluate prediction algorithms has a significant disadvantage. At means the precision may be high, considering the top L links with the highest scores, whereas an algorithm’s overall performance is unsatisfactory because some missing connections are much easier to predict than others. If a network has a heavy-tailed degree distribution, the chances are excellent that two high-degree vertices have a missing connection, and such a connection can be predicted

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.