Abstract

Proper identification of collocations (and more generally of multiword expressions (MWEs), is an important qualitative step for several NLP applications and particularly so for translation. Since many MWEs cannot be translated literally, failure to identify them yields at best inaccurate translation. This paper is mostly be concerned with collocations. We will show how they differ from other types of MWEs and how they can be successfully parsed and translated by means of a grammar-based parser and translator.

Highlights

  • Proper identification of collocations and more generally of multiword expressions (MWEs), is of critical importance for many NLP applications, notably translation

  • We described a comprehensive multilingual translation system which combines a deep syntactic parser— including a collocation detection component and an anaphora resolution mechanism—using an information-rich lexical database including monolingual lexical units, as well as bilingual data, i.e., correspondences between lexical items of source and target languages

  • Multiword expressions and in particular collocations constitute an important aspect of natural language and must be treated adequately by natural language processing systems, because of the high frequency of MWEs in most documents and, in the case of translation, because they usually cannot be translated literally

Read more

Summary

INTRODUCTION

Proper identification of collocations and more generally of multiword expressions (MWEs), is of critical importance for many NLP applications, notably translation. Examples (1d– e) display the verb-object collocation spend-money and break-record, but because of syntactic transformations—passive in Example (1d), so-called tough-movement in Example (1e)—the two terms are in reverse order. Such examples clearly show the usefulness of syntactic knowledge for a precise identification of collocations. The paper is organized as follows: we will briefly review the main distinctions between the most common types of MWEs. Section 3 will focus on the parsing process and show how collocations can be useful with respect to categorial disambiguation. Though based on highly ambiguous nouns (in bold face), those collocations are rather unambiguous

COLLOCATIONS AND MULTIWORD EXPRESSIONS
Multiword Expressions Matter for NLP
TREATMENT OF MWES IN A LINGUISTICALLY-BASED SYSTEM
TRANSLATING COLLOCATIONS
Collocation Identification
Collocation Transfer and Generation
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call