SIRIUS-LTG-UiO at SemEval-2018 Task 7: Convolutional Neural Networks with Shortest Dependency Paths for Semantic Relation Extraction and Classification in Scientific Papers

Farhad Nooralahzadeh,Lilja Øvrelid,Jan Tore Lønning

doi:10.18653/v1/s18-1128

Farhad Nooralahzadeh, Lilja Øvrelid + Show 1 more

Open Access

https://doi.org/10.18653/v1/s18-1128

Copy DOI

Abstract

This article presents the SIRIUS-LTG-UiO system for the SemEval 2018 Task 7 on Semantic Relation Extraction and Classification in Scientific Papers. First we extract the shortest dependency path (sdp) between two entities, then we introduce a convolutional neural network (CNN) which takes the shortest dependency path embeddings as input and performs relation classification with differing objectives for each subtask of the shared task. This approach achieved overall F1 scores of 76.7 and 83.2 for relation classification on clean and noisy data, respectively. Furthermore, for combined relation extraction and classification on clean data, it obtained F1 scores of 37.4 and 33.6 for each phase. Our system ranks 3rd in all three sub-tasks of the shared task.

Highlights

Relation extraction and classification can be defined as follows: given a sentence where entities are manually annotated, we aim to identify the pairs of entities that are instances of the semantic relations of interest and classify them based on a pre-defined set of relation types
The use of deep neural networks for relation classification has been investigated in several recent studies (Socher et al, 2012; Lin et al, 2016; Zhou et al, 2016)
The sdp between two entities in the dependency graph captures a condensed representation of the information required to assert a relationship between two entities (Bunescu and Mooney, 2005). We continue this line of work and present a system based on a convolutional neural network (CNN) architecture over shortest dependency paths combined with domain-specific word embeddings to extract and classify semantic relations in scientific papers

Summary

Introduction

Relation extraction and classification can be defined as follows: given a sentence where entities are manually annotated, we aim to identify the pairs of entities that are instances of the semantic relations of interest and classify them based on a pre-defined set of relation types. The re-emergence of deep neural networks provides a way to develop highly automatic features and representations to handle complex interpretation tasks These approaches have yielded impressive results for many different NLP tasks. Convolutional neural networks (CNNs) have been effectively applied to extract lexical and sentence level features for relation classification (Zhang and Wang, 2015; Lee et al, 2017; Nguyen and Grishman, 2015). These works consider whole sentences or the context between two target entities as input for the CNN. We continue this line of work and present a system based on a CNN architecture over shortest dependency paths combined with domain-specific word embeddings to extract and classify semantic relations in scientific papers

Objectives

Methods

Results

Conclusion