Implicit Entity Recognition and Linking in Tweets

Hawre Hosseini

doi:10.32920/25413838.v1

Abstract

Linking textual content to entities from the knowledge graph has received increasing attention where surface form representations of entities are linked to appropriate entities. This allows textual content, e.g., social user-generated content, to be interpreted at a higher semantic level. However, recent research has shown that at least 15% of social user-generated content do not have explicit surface form representation of entities that they discuss. In other words, the subject of the content is only implied. For such cases, existing named entity recognition and linking methods, known as explicit entity linking, cannot perform linking because entity surface form is missing. The objective of this dissertation, while introducing and publicly sharing a comprehensive gold standard dataset for the tasks of implicit named entity recognition and linking, is to propose approaches to these tasks. We formulate the problem of recognizing implicit entity mentions in tweets, where we propose to leverage categorical and linguistically inspired features based on Systemic Functional Linguistics. Our implicit named entity recognizer achieves promising results on different evaluation metrics. Additionally, we propose two approaches for linking implicit mentions in tweets. Within the first, we formulate the problem of implicit entity linking as an ad-hoc document retrieval process where the input query is the tweet, which needs to be implicitly linked and the document space is the set of textual descriptions of entities in the knowledge graph. We systematically compare our work with existing work showing our method is able to provide improvements on a range of retrieval measures. Within the second approach, we model implicit entity linking as a learn to rank problem where knowledge graph entities are ranked based on their relevance to the input tweet. In doing so, we introduce and systematically classify appropriate features for identifying implicit entities. In our experiments, we show that our proposed features are able to improve the state of the art. For SFL-based recognition of implicit entity mentions as well as for the ad-hoc retrieval based and learn to rank based approaches to linking of such mentions, we provide qualitative assessment of the root causes for mislabeled instances in our experiments.

Full Text