Abstract
Slot Filling, a subtask of Relation Extraction, represents a key aspect for building structured knowledge bases usable for semantic-based information retrieval. In this work, we present a machine learning filter whose aim is to enhance the precision of relation extractors while minimizing the impact on the recall. Our approach consists in the filtering of relation extractors’ output using a binary classifier. This classifier is based on a wide array of features including syntactic, semantic and statistical features such as the most frequent part-of-speech patterns or the syntactic dependencies between entities. We experimented the classifier on the 18 participating systems in the TAC KBP 2013 English Slot Filling track. The TAC KBP English Slot Filling track is an evaluation campaign that targets the extraction of 41 pre-identified relations (e.g., title, date of birth, countries of residence, etc.) related to specific named entities (persons and organizations). Our results show that the classifier is able to improve the global precision of the best 2013 system by 20.5% and improve the F1-score for 20 relations out of 33 considered.
Highlights
In the age of structured knowledge bases such as Google Knowledge Graph [1], DBpedia [2]and the Linked Open Data cloud [3], relation extraction is becoming a very important challenge for enhanced semantic search
We focus on the Text Analysis Conference (TAC) Knowledge
The amount of training data varies widely across relations, primarily because some relations expect multiple values, whereas others only expect a single value as filler, and secondly because relation extractors have more ease in generating and retaining candidates for certain relations compared to others
Summary
In the age of structured knowledge bases such as Google Knowledge Graph [1], DBpedia [2]and the Linked Open Data cloud [3], relation extraction is becoming a very important challenge for enhanced semantic search. Relation extraction generally consists of extracting relations from unstructured information This is done by identifying the meaningful links between named entities. The amount of training data varies widely across relations, primarily because some relations expect multiple values, whereas others only expect a single value as filler, and secondly because relation extractors have more ease in generating and retaining candidates for certain relations compared to others. Another important fact to consider is the class imbalance towards the wrong class in most relations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.