Abstract

This work tackles the problem of matching Wikipedia red links with existing articles. Links in Wikipedia pages are considered red when lead to nonexistent articles. In other Wikipedia editions could exist articles that correspond to such red links. In our work, we propose a way to match red links in one Wikipedia edition to existent pages in another edition. We define the task as a Named Entity Linking problem because red link titles are mostly named entities. We solve it in a context of Ukrainian red links and English existing pages. We created a dataset of 3171 most frequent Ukrainian red links and a dataset of almost 3 million pairs of red links and the most probable candidates for the correspondent pages in English Wikipedia. This dataset is publicly released1. In this work we define conceptual characteristics of the data — word and graph properties — based on its analysis and exploit these properties in entity resolution. BabelNet knowledge base was applied to this task and was regarded as a baseline for our approach (F1 score = 32 %). To improve the result we introduced several similarity metrics based on mentioned red links characteristics. Combined in a linear model they resulted in F1 score = 85 %. To the best of our knowledge, we are the first to state the problem and propose a solution for red links in Ukrainian Wikipedia edition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.