Abstract

We predict the compositionality of multi-word expressions using distributional similarity between each component word and the overall expression, based on translations into multiple languages. We evaluate the method over English noun compounds, English verb particle constructions and German noun compounds. We show that the estimation of compositionality is improved when using translations into multiple languages, as compared to simply using distributional similarity in the source language. We further find that string similarity complements distributional similarity. © 2014 Association for Computational Linguistics.

Highlights

  • Multiword expressions are combinations of words which are lexically, syntactically, semantically or statistically idiosyncratic (Sag et al, 2002; Baldwin and Kim, 2009)

  • Multiple languages are generally used, but more languages are used for English verb particle constructions (VPCs) than either of the compound noun datasets

  • English noun compounds are relatively easy to identify in a corpus,7 because the components occur sequentially, and the only morphological variation is in noun number

Read more

Summary

Introduction

Multiword expressions (hereafter MWEs) are combinations of words which are lexically, syntactically, semantically or statistically idiosyncratic (Sag et al, 2002; Baldwin and Kim, 2009). Considerably less work has addressed the task of predicting the meaning of MWEs, especially in non-English languages. As a step in this direction, the focus of this study is on predicting the compositionality of MWEs. An MWE is fully compositional if its meaning is predictable from its component words, and it is non-compositional (or idiomatic) if not. As an example, Carpuat and Diab (2010) proposed two strategies for integrating MWEs into statistical machine translation. They show that even a large scale bilingual corpus cannot capture all the necessary information to translate MWEs, and that in adding the facility to model the compositionality of MWEs into their system, they could improve translation quality. While searching for documents related to ivory tower, we are almost certainly not interested in documents relating to elephant tusks

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.