DReD–A Descriptive Relation Dataset for Expanding Relation Extraction

Logan Markewich,Yubin Xing,Roy Ka-Wei Lee,Zhi Li,Seokbum Ko

doi:10.1109/tai.2022.3205567

Abstract

Relation extraction is a fundamental topic in document information extraction. Traditionally, datasets for relation extraction have been annotated with named entities and classified with a subset of relation categories. Models then predict either the entities and relations (end-to-end) or assume the entities are given and only classify the relations. However, current approaches are limited by datasets with a narrow definition of entities and relations. We seek to remedy this by introducing our Descriptive Relation Dataset (DReD), which contains 3286 annotations for descriptions of relations between more general noun phrases inspired by linguistic theory. We benchmark our dataset using several seq2seq models and find that T5 achieves the best results with a ROUGE-1 score of 75.5. We verify the usefulness of DreD by collecting feedback on 100 predictions and comparing human judgment to automated scoring methods. Finally, we verify that relations can be described accurately by transforming the CoNLL04 and Re-TACRED datasets and mapping sentence templates to relation categories. T5 achieves competitive accuracy on CoNLL-04 and Re-TACRED with an F1 score of 78.6 and 90.4, respectively. With this paper, we prove that relations can be described, therefore overcoming the limitations set by previous datasets and approaches. We publicly provide our dataset and training code at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/logan-markewich/DReD</uri> .

Full Text