Abstract
In this paper, we present an empirical study on the definition of compounds in English, the graded nature of the phenomenon and its correlations with the commonly used linguistic criteria for compoundhood. We create a resource that includes a diverse set of nominal compounds identified by two trained independent annotators in sentences from the proceedings of the European Parliament. In addition, the annotators provide ratings on the compoundhood of the identified compounds, and ratings for the applicability of six prominent linguistic criteria of compoundhood for each item. We show the controversy of defining compounds in practice by comparing the annotations of two annotators, and the graded nature of compoundhood. By measuring the correlation between compoundhood and the six diverse linguistic criteria using machine learning techniques, we show that some linguistic criteria are stronger predictors of compoundhood than others.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have