Constructing a poor man’s wordnet in a resource-rich world

Darja Fišer,Benoît Sagot

doi:10.1007/s10579-015-9295-6

Abstract

In this paper we present a language-independent, fully modular and automatic approach to bootstrap a wordnet for a new language by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora, and Wikipedia. The approach, which we apply here to Slovene, takes into account monosemous and polysemous words, general and specialised vocabulary as well as simple and multi-word lexemes. The extracted words are then assigned one or several synset ids, based on a classifier that relies on several features including distributional similarity. Finally, we identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic, manual and task-based evaluations show that the resulting resource, the latest version of the Slovene wordnet, is already a valuable source of lexico-semantic information.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Constructing a poor man’s wordnet in a resource-rich world

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation

Lead the way for us

Journal: Language Resources and Evaluation	Publication Date: Feb 11, 2015
Citations: 32

Similar Papers

Description and acquisition of multiword lexemes
Angelika Storrer ... Ulrike Schwall
-
Angelika Storrer, et. al.Angelika Storrer ... Ulrike Schwall
01 Jan 1995
01 Jan 1995

O avtomatski evalvaciji strojnega prevajanja
Darinka Verdonik ... Mirjam Sepesy Maučec
Slovenščina 2.0: empirical, applied and interdisciplinary research | VOL. 1
Darinka Verdonik, et. al.Darinka Verdonik ... Mirjam Sepesy Maučec
01 Dec 2013
Slovenščina 2.0: empirical, applied and interdisciplinary research | VOL. 1

Discourse-level Features for Statistical Machine Translation

-

01 Jan 2015
01 Jan 2015

Language resources for a network-based dictionary
Veit Reuer
-
Veit ReuerVeit Reuer
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Constructing a poor man’s wordnet in a resource-rich world

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation