Extraction of multi-word expressions from small parallel corpora

Yulia Tsvetkov,Shuly Wintner

doi:10.1017/s1351324912000101

Abstract

AbstractWe present a general, novel methodology for extracting multi-word expressions (MWEs) of various types, along with their translations, from small, word-aligned parallel corpora. Unlike existing approaches, we focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a non-compositional way. We introduce a simple algorithm that proposes MWE candidates based on such misalignments, relying on 1:1 alignments as anchors that delimit the search space. We use a large monolingual corpus to rank and filter these candidates. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. The extracted MWEs, with their translations, are used in the training of a statistical machine translation system, showing a small but significant improvement in its performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extraction of multi-word expressions from small parallel corpora

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering

Lead the way for us

Journal: Natural Language Engineering	Publication Date: Mar 21, 2012
Citations: 16

Similar Papers

Statistics based evaluation of English Multi-Word Expressions
Rakhi Joon ... Archana Singhal
International Journal of Engineering and Advanced Technology | VOL. 9
Rakhi Joon, et. al.Rakhi Joon ... Archana Singhal
30 Oct 2019
International Journal of Engineering and Advanced Technology | VOL. 9

Integrating morphology with multi-word expression processing in Turkish
Kemal Oflazer ... Özlem Çetinoğlu
-
Kemal Oflazer, et. al.Kemal Oflazer ... Özlem Çetinoğlu
01 Jan 2004
01 Jan 2004

Comparing and combining a semantic tagger and a statistical tool for MWE extraction
Scott Songlin Piao ... Tony Mcenery
Computer Speech & Language | VOL. 19
Scott Songlin Piao, et. al.Scott Songlin Piao ... Tony Mcenery
18 Mar 2005
Computer Speech & Language | VOL. 19

Role of Lexical and Syntactic Fixedness in Acquisition of Hindi MWEs
Rakhi Joon ... Archana Singhal
-
Rakhi Joon, et. al.Rakhi Joon ... Archana Singhal
01 Jan 2019
01 Jan 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extraction of multi-word expressions from small parallel corpora

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering