Abstract
This paper presents a new multilingual corpus with semantic annotation of collocations in English, Portuguese, and Spanish. The whole resource contains 155k tokens and 1,526 collocations labeled in context. The annotated examples belong to three syntactic relations (adjective-noun, verb-object, and nominal compounds), and represent 58 lexical functions in the Meaning-Text Theory (e.g., Oper, Magn, Bon, etc.). Each collocation was annotated by three linguists and the final resource was revised by a team of experts. The resulting corpus can serve as a basis to evaluate different approaches for collocation identification, which in turn can be useful for different NLP tasks such as natural language understanding or natural language generation.
Highlights
The automatic identification of collocations, as well as other multiword expressions (MWEs), is crucial for many natural language processing (NLP) tasks, since their linguistic behaviour differs from other combinations of words (Mel’cuk, 1995; Sag et al, 2002; Ramisch and Villavicencio, 2018)
In the following we present some examples of the most productive lexical functions in each pattern: Adjective-noun: collocations where the adjective has a function of intensification and attenuation (M agn: high priority, or AntiM agn: weak resource), expresses a positive or negative consideration from the speaker (Bon: great event, AntiBon: unfortunate mistake), or conveys a specific sense (N onStandard) in combination with the noun (Mel’cuk, 1996)
This paper presented a multilingual corpus with manual annotation of collocations and their lexical functions in English, Portuguese, and Spanish
Summary
The automatic identification of collocations, as well as other multiword expressions (MWEs), is crucial for many natural language processing (NLP) tasks, since their linguistic behaviour differs from other combinations of words (Mel’cuk, 1995; Sag et al, 2002; Ramisch and Villavicencio, 2018). Approaches to natural language generation may take advantage of collocational information to produce natural utterances with the desired meanings (Wanner et al, 2010; Lareau et al, 2011). The concept of collocation was formalized in the Meaning-Text Theory as a combination of two lexical units (LUs) where one of them (the BASE, e.g., attention in the collocation pay attention) is freely selected due to its meaning, while the selection of the other one (the COLLOCATE, e.g., [to] pay) is restricted by the former (Mel’cuk, 1995) Under this theory, lexical functions (LF) represent a relation between a LU (the base) and a set of expressions (the potential collocates) (Mel’cuk, 1996, 1998; Wanner, 1996). The adjective– noun collocation loud screech can be represented as M agn(screech)=loud, where the lexical function M agn denotes ‘intensification’
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.