Web-based extraction of semantic relation instances for terminology work

Jakob Halskov,Caroline Barrière

doi:10.1075/term.14.1.03hal

Abstract

This article describes the implementation and evaluation of WWW2REL, a domain-independent and pattern-based knowledge discovery system which extracts semantic relation instances from text fragments on the WWW so as to assist terminologists updating or expanding existing ontologies. Unlike most comparable systems, WWW2REL is special in that it can be applied to any semantic relation type and operates directly on unannotated and uncategorized WWW text snippets rather than static repositories of academic papers from the target domain. The WWW is used for knowledge pattern (KP) discovery, KP filtering and relation instance discovery. The system is tested with the help of the biomedical UMLS Metathesaurus for four different relation types and is manually evaluated by four domain experts. This system evaluation shows how ranking relation instances by a measure of “knowledge pattern range” and applying two heuristics yields an average performance of 70% to 65% of the maximum possible F-score by top 10 and top 50 instances, respectively. Importantly, results show that much valuable information not present in the UMLS can be found through the proposed method. Finally, the article examines the domain-dependence of different aspects of the pattern-based knowledge discovery approach proposed.

Full Text