Sentence-final completion tasks serve as valuable tools in studying language processing and the associated predictive mechanisms. There are several established sentence-completion norms for languages like English, Portuguese, French, and Spanish, each tailored to the language it was designed for and evaluated in. Yet, cultural variations among native speakers of the same language complicate the claim of a universal application of these norms. In this study, we developed a corpus of 2925 sentence-completion norms specifically for Mexican Spanish. This corpus is distinctive for several reasons: Firstly, it is the most comprehensive set of sentence-completion norms for Mexican Spanish to date. Secondly, it offers a substantial range of experimental stimuli with considerable variability in terms of the predictability of word sentence completion (cloze probability/surprisal) and the level of uncertainty inherent in the sentence context (entropy). Thirdly, the syntactic complexity of the sentences in the corpus is varied, as are the characteristics of the final word nouns (including aspects of concreteness/abstractness, length, and frequency). This paper details the generation of the sentence contexts, explains the methodology employed for data collection from a total of 1470 participants, and outlines the approach to data analysis for the establishment of sentence-completion norms. These norms provide a significant contribution to fields such as linguistics, cognitive science, and machine learning, among others, by enhancing our understanding of language, predictive mechanisms, knowledge representation, and context representation. The collected data is accessible through the Open Science Framework (OSF) at the following link: https://osf.io/js359/?view_only=bb1b328d37d643df903ed69bb2405ac0.
Read full abstract