Abstract

Wikification is the process of linking the entities found in a sample text to their individual Wikipedia or Wikidata pages. Many natural language processing applications, including question-answering systems, information retrieval, fraud detection, and recommendation systems(RS), can benefit from this information extraction technique. There has been a great deal of effort put towards entity-linking(EL) for both Asian and Western languages, with several datasets and numerous proposed methodologies. Despite millions of Urdu language users globally, the entity-linking for the Urdu language has not been worked on, to the best of our knowledge. This work proposes an Urdu EL pipeline to identify named entities in text and link them to Wikidata. Secondly, a dataset of 550 Urdu news titles relating to their respective Wiki-ids has been prepared for the examination. Third, utilizing the proposed EL pipeline, 16738 news articles from the first-ever Urdu news RS dataset of 100 users are annotated. Fourthly, a sub Knowledge graph (KG) of 8439 entities and 23080 relationship tuples is retrieved from Wikidata. The Trans-E algorithm is then used to create KG embeddings so that the extracted KG may be used in an Urdu news RS. The final accuracy of Urdu news RS is 60.8%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call