From General to Specialized Domain: Analyzing Three Crucial Problems of Biomedical Entity Disambiguation

Stefan Zwicklbauer,Christin Seifert,Michael Granitzer

doi:10.1007/978-3-319-22849-5_6

Abstract

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. Most disambiguation systems focus on general purpose knowledge bases like DBpedia but leave out the question how those results generalize to more specialized domains. This is very important in the context of Linked Open Data, which forms an enormous resource for disambiguation. We implement a ranking-based Learning To Rank disambiguation system and provide a systematic evaluation of biomedical entity disambiguation with respect to three crucial and well-known properties of specialized disambiguation systems. These are i entity context, i.e. the way entities are described, ii user data, i.e. quantity and quality of externally disambiguated entities, and iii quantity and heterogeneity of entities to disambiguate, i.e. the number and size of different domains in a knowledge base. Our results show that i the choice of entity context that is used to attain the best disambiguation results strongly depends on the amount of available user data, ii disambiguation results with large-scale and heterogeneous knowledge bases strongly depend on the entity context, iii disambiguation results are robust against a moderate amount of noise in user data and iv some results can be significantly improved with a federated disambiguation approach that uses different entity contexts. Our results indicate that disambiguation systems must be carefully adapted when expanding their knowledge bases with special domain entities.

Full Text