Abstract

Nowadays, there is a growing interest in solving NLP tasks using external knowledge storage, for example, in information retrieval, question-answering systems, dialogue systems, etc. Thus it is important to establish relations between entities in the processed text and a knowledge base. This article is devoted to entity linking, where Wikidata is used as an external knowledge base. We consider scientific terms in Russian as entities. Traditional entity linking system has three stages: entity recognition, candidates (from knowledge base) generation, and candidate ranking. Our system takes raw text with the defined terms in it as input. To generate candidates we use string match between terms in the input text and entities from Wikidata. The candidate ranking stage is the most complicated one because it requires semantic information. Several experiments for the candidate ranking stage with different models were conducted, including the approach based on cosine similarity, classical machine learning algorithms, and neural networks. Also, we extended the RUSERRC dataset, adding manually annotated data for model training. The results showed that the approach based on cosine similarity leads to better results compared to others and doesn’t require manually annotated data. The dataset and system are open-sourced and available for other researchers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.