Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

TORSTEN ZESCH,IRYNA GUREVYCH

doi:10.1017/s1351324909990167

Abstract

AbstractIn this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the ‘wisdom of linguists’ (i.e., classical wordnets) or by the ‘wisdom of crowds’ (i.e., collaboratively constructed knowledge sources like Wikipedia).The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that ‘wisdom of crowds’ based resources are not superior to ‘wisdom of linguists’ based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available1 for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering

Lead the way for us

Journal: Natural Language Engineering	Publication Date: Sep 9, 2009
Citations: 105

Similar Papers

Semantic smoothing for text clustering
Jamal A Nasir ... George Tsatsaronis
Knowledge-Based Systems | VOL. 54
Jamal A Nasir, et. al.Jamal A Nasir ... George Tsatsaronis
24 Sep 2013
Knowledge-Based Systems | VOL. 54

SISR: System for integrating semantic relatedness and similarity measures
Mohamed Ben Aouicha ... Mohamed Ali Hadj Taieb
Soft Computing | VOL. 22
Mohamed Ben Aouicha, et. al.Mohamed Ben Aouicha ... Mohamed Ali Hadj Taieb
21 Nov 2016
Soft Computing | VOL. 22

Semantic textual relatedness: A hybrid method
Muhammad Fauzan Razandi ... Eldita Febrian Selfiendi
-
Muhammad Fauzan Razandi, et. al.Muhammad Fauzan Razandi ... Eldita Febrian Selfiendi
01 Apr 2016
01 Apr 2016

Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.
Mark Ormerod ... Jesús Martínez Del Rincón
JMIR Medical Informatics | VOL. 9
Mark Ormerod, et. al.Mark Ormerod ... Jesús Martínez Del Rincón
26 May 2021
JMIR Medical Informatics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering