Abstract

Homophone duration in spontaneous speech: A mixed-effects model Susanne Gahl August 2009 i 1. Introduction A recent analysis of a corpus of spontaneous speech (Gahl, 2008) showed that homophone pairs differed in duration as a function of word frequency. For example, the high-frequency word time was shorter on average than its less-frequent homophone twin thyme. This effect persisted when other factors affecting word duration were statistically controlled for in a linear regression model. However, that model had several serious limitations. The goal of the current study is to overcome these limitations and to explore the determinants of word duration further. The model presented in Gahl (2008) was a linear regression model. Its outcome variable was the average duration of the higher-frequency member of the homophone pairs. Predictors were entered into the model in a blockwise fashion, in three separate blocks. The sole predictor in the first block was the average duration of the lower- frequency member of a homophone pair. The second block contained known determinants of word duration in connected speech, such as contextual speaking rate, the probability of a word given neighboring words in an utterance, orthographic regularity, and proximity to pauses. On the third and final block, the frequency of the higher- frequency member of the homophone pair (i.e. the frequency of the word whose duration was to be predicted by the model) was entered, to ascertain whether word frequency was a significant predictor of word duration over and above other known factors. That question has theoretical implications for linguistic and psycholinguistic models of language production, which are discussed in Gahl (2008). The modeling strategy of predicting the average duration of the higher frequency member of the homophone pair (e.g. time) from the lower frequency member (e.g. thyme) is problematic in a number of ways. For one thing, information specific to the lower frequency homophone never entered into the model predictions, except indirectly, via the duration of the lower-frequency homophone. For example, while the orthographic regularity of the high-frequency homophone was a predictor in the model, the orthographic regularity of the lower-frequency homophone was not. This meant that the homophone twin was not a perfect control for the effect of phonemic content on word duration, since the duration of, for example, thyme in part reflect the orthographic regularity of that word. Properties specific to the low-frequency words were never entered into the model. A further problem with the modeling strategy in Gahl (2008) was that information about specific word tokens was lost to the model. All of the predictors in the model represented information about word types, not word tokens. For some variables, this is as it should be. Word frequency, for example, is a property of a word type: The frequency of the word thyme is a property of the word type thyme, not of an individual token. By contrast, whether the word thyme immediately precedes a pause in an utterance, on the other hand, is a property of a specific token of the word. Information about proximity to

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.