Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models

Jason Utt,Sebastian Padó

doi:10.1162/tacl_a_00180

Abstract

Syntax-based distributional models of lexical semantics provide a flexible and linguistically adequate representation of co-occurrence information. However, their construction requires large, accurately parsed corpora, which are unavailable for most languages. In this paper, we develop a number of methods to overcome this obstacle. We describe (a) a crosslingual approach that constructs a syntax-based model for a new language requiring only an English resource and a translation lexicon; and (b) multilingual approaches that combine crosslingual with monolingual information, subject to availability. We evaluate on two lexical semantic benchmarks in German and Croatian. We find that the models exhibit complementary profiles: crosslingual models yield higher accuracies while monolingual models provide better coverage. In addition, we show that simple multilingual models can successfully combine their strengths.

Highlights

Building on the Distributional Hypothesis (Harris, 1954; Miller and Charles, 1991), which states that words occurring in similar contexts are similar in meaning, distributional semantic models (DSMs) represent a word’s meaning via its occurrence in context in large corpora
Since the nature of the translation is not indicated in the translation lexicon, we exploit typical redundancies in the source Distributional Memory (DM), which often contains “quasi-synonymous” edges that express the same relation with different words, e.g., book obj read and novel obj read
Does not require parallel or comparable corpora. That translation lexicons such as the ones we use can be extracted from comparable corpora (Rapp, 1999; Vulicand Moens, 2012, and many others), though few papers are concerned with the translation at the level of semantic relations, as we are

Summary

Introduction

Building on the Distributional Hypothesis (Harris, 1954; Miller and Charles, 1991), which states that words occurring in similar contexts are similar in meaning, distributional semantic models (DSMs) represent a word’s meaning via its occurrence in context in large corpora. A notable subclass of DSMs are syntax-based models (Lin, 1998; Baroni and Lenci, 2010) which use (lexicalized) syntactic relations as dimensions. They are able to model more fine-grained distinctions than word spaces and have been found to be useful for tasks such as selectional preference learning (Erk et al, 2010), verb class induction (Schulte im Walde, 2006), analogical reasoning (Turney, 2006), and alternation discovery (Joanis et al, 2006). The paper concludes with related work (Section 8) and a general discussion (Section 9)

Motivation and Definition

DMs for Other Languages

Motivation

Translating DMs with Translation Lexicons

Ambiguity in Unfiltered Translation

Filtering by Backtranslation

Defining Similarity

Multilingual Construction of DMs

Experimental Setup

Miesmacher

Procedure

Models

BOW PCA500

Experimental Evaluation on German

Experimental Evaluation on Croatian

Related Work

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Dec 1, 2014
Citations: 41	License type: cc-by

R Discovery Prime

R Discovery Prime

Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

Lexical acquisition and semantic space models: Learning the semantics of unknown words
Kostadin Cholakov
Natural Language Engineering | VOL. 20
Kostadin CholakovKostadin Cholakov
05 Mar 2013
Natural Language Engineering | VOL. 20

Semantic models for answer re-ranking in question answering
Piero Molino
-
Piero MolinoPiero Molino
28 Jul 2013
28 Jul 2013

Determining the optimal environmental information for training computational models of lexical semantics and lexical organization.
Brendan T Johns
Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale | VOL. 78
Brendan T JohnsBrendan T Johns
01 Sep 2024
Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale | VOL. 78

Bilingual Semantic Network Construction
Jianyong Duan ... Yi Hu
-
Jianyong Duan, et. al.Jianyong Duan ... Yi Hu
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics