Fine-grained General Entity Typing in German using GermaNet

Sabine Weber,Mark Steedman

doi:10.18653/v1/2021.textgraphs-1.14

Abstract

Fine-grained entity typing is important to tasks like relation extraction and knowledge base construction. We find however, that fine-grained entity typing systems perform poorly on general entities (e.g. “ex-president”) as compared to named entities (e.g. “Barack Obama”). This is due to a lack of general entities in existing training data sets. We show that this problem can be mitigated by automatically generating training data from WordNets. We use a German WordNet equivalent, GermaNet, to automatically generate training data for German general entity typing. We use this data to supplement named entity data to train a neural fine-grained entity typing system. This leads to a 10% improvement in accuracy of the prediction of level 1 FIGER types for German general entities, while decreasing named entity type prediction accuracy by only 1%.

Highlights

By state of the art entity typing systems, it is unclear how well these systems perform on general entities (GEs) like ‘ex-president’
We find that accuracy and F1 score of a state-of-the-art German fine-grained entity typing system are 17% lower on general entities than on named entities
In contrast to coarse grained entity typing costly and time intensive we propose an approach it uses a larger set of types (e.g. 112 types in that uses existing resources to create silver annothe FIGER ontology (Ling and Weld, 2012)), and tated GE typing data

Summary

Introduction

By state of the art entity typing systems, it is unclear how well these systems perform on general entities (GEs) like ‘ex-president’. We find that accuracy and F1 score of a state-of-the-art German fine-grained entity typing system are 17% lower on general entities than on named entities (see Table 1 and section 5) This is because the training data for. In contrast to coarse grained entity typing costly and time intensive we propose an approach it uses a larger set of types (e.g. 112 types in that uses existing resources to create silver annothe FIGER ontology (Ling and Weld, 2012)), and tated GE typing data. For this we use German text a multilevel type hierarchy. While the typing of the named achieve 10% improvement in accuracy of the preentity (NE) ‘Barack Obama’ can be performed diction of level 1 FIGER types for German general

Methods

Results

Discussion

Conclusion