Abstract

The Large Database of English Compounds (LADEC) consists of over 8,000 English words that can be parsed into two constituents that are free morphemes, making it the largest existing database specifically for use in research on compound words. Both monomorphemic (e.g., wheel) and multimorphemic (e.g., teacher) constituents were used. The items were selected from a range of sources, including CELEX, the English Lexicon Project, the British Lexicon Project, the British National Corpus, and Wordnet, and were hand-coded as compounds (e.g., snowball). Participants rated each compound in terms of how predictable its meaning is from its parts, as well as the extent to which each constituent retains its meaning in the compound. In addition, we obtained linguistic characteristics that might influence compound processing (e.g., frequency, family size, and bigram frequency). To show the usefulness of the database in investigating compound processing, we conducted a number of analyses that showed that compound processing is consistently affected by semantic transparency, as well as by many of the other variables included in LADEC. We also showed that the effects of the variables associated with the two constituents are not symmetric. In short, LADEC provides the opportunity for researchers to investigate a number of questions about compounds that have not been possible to investigate in the past, due to the lack of sufficiently large and robust datasets. In addition to directly allowing researchers to test hypotheses using the information included in LADEC, the database will contribute to future compound research by allowing better stimulus selection and matching.

Highlights

  • The Large Database of English Compounds (LADEC) consists of over 8,000 English words that can be parsed into two constituents that are free morphemes, making it the largest existing database for use in research on compound words

  • We included a variety of other measures that are useful for compound research, such as bigram frequency and family size

  • The standardized coefficients, fit statistics, and sample sizes for predicting the English Lexicon Project (ELP) lexical decision times, British Lexicon Project (BLP) lexical decision times, and ELP naming times are shown in Tables 7, 8, and 9, respectively, for the models with the compound-based control variables, and in Tables 10, 11, and 12, for the models that included constituent-based control variables

Read more

Summary

Introduction

The Large Database of English Compounds (LADEC) consists of over 8,000 English words that can be parsed into two constituents that are free morphemes, making it the largest existing database for use in research on compound words. In terms of other variables, The aim of the present project was to provide a large-scale database of English closed compounds ( called concatenated compounds) along with a range of their orthographic, Behav Res (2019) 51:2152–2179 morphological, and semantic properties that are relevant for psycholinguistic, corpus, neurolinguistic and computational linguistic research. These properties include various measures of semantic transparency (based on human ratings and measures of association), family size, and bigram frequency at the morpheme boundary. It is useful to have sets of ratings that target different aspects of the relationship between a compound and its morphological constituents

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call