. . iambic [is] the verse-form closest to speech. There is evidence of this: we speak iambics in conversation with each other very often. . . .Aristotle in PoeticsMuch work on speech has been driven far more by a desire to classify languages into categories than by the need to elucidate the actual rhythms of spoken utterances. Common approaches to speech focus, for example, on the variability of syllabic durations within utterances (Grabe & Low, 2002) or the proportion of an utterance's duration that is occupied by vowels (Ramus, Nespor, & Mehler, 1999). But these features do not specify actual rhythms-that is, the temporal patterns of syllable onsets within an utterance-and instead reduce whole languages to descriptive statistics. Knowing that English is 40% vocalic (Ramus et al., 1999) indicates little about the timing of syllable onsets within any given English utterance, even though this information may be useful in differentiating English taxonomically from languages having different types of syllable structure.Outside of linguistics, though, representations of sentence rhythms are commonplace, and it is unclear why such representations have not had a larger impact on linguistic theories. Poetic verse, song, Shakespearean dialogue, and rap are all based on musical notions of the periodicity of syllable onsets. Consider the rhythmic transcription of the text of the children's song Twinkle Twinkle shown in Figure 1a. The is organized as a two-beat cycle alternating between strong and weak beats. The relative onset-time and relative duration of every syllable in the sentence is specified, hence making this a true representation of a rhythm. Next, the stressed syllables of the disyllabic words fall on the strong beats of the two-beat cycle (i.e., the downbeats), whereas the unstressed syllables fall on the weak beats. Finally, we see that even silence is specified in this transcription in the form of the rest that sits in between star and How, in this case indicating a sentence break.Regardless of the fact that Twinkle Twinkle is a poetic form of speech, its transcription effectively captures the basic elements of what a model of speech should describe: (a) it specifies a unit of rhythm, in this case the two-beat metrical units that make up each measure of the transcription; (b) it specifies the relative onset-time and relative duration of every syllable in the sentence; and (c) it represents not only the duration but the weight (i.e., stress) of each syllable in the sentence, such that prominent syllables fall on strong beats. Each of these three elements has been analyzed in isolation in various models of speech rhythm, but they have rarely been synthesized into a unified model. These three elements have been analyzed, respectively, in isochrony models, metrics, and metrical phonology. We briefly review these three traditions in phonology before mentioning the only integrated account that we know of, namely Joshua Steele's 1775, treatise An Essay Toward Establishing the Melody and Measure of Speech to be Expressed and Perpetuated by Peculiar Symbols. In our study, we report a test of a critical prediction of a musical model of speech, namely, that the production of time intervals between stressed syllables (here called groups) is based on a music-like representation of metrical structure. In particular, the of speech can serve to stabilize the timing of prominence groups when the timing of individual syllables varies. At the same time, speech (like music) can feature changes in meter that lead to commensurate changes in the timing of prominence groups.Isochrony ModelsThe first issue for speech relates to specifying a unit of rhythm. Lloyd James (1940, quoted in Pike, 1945) contrasted languages having a similar to a machine gun with those having a similar to Morse code. Pike (1945) classified such languages as syllable-timed and stress-timed, respectively, a categorization that is often referred to as the rhythm class hypothesis (Abercrombie, 1967; Grabe & Low, 2002). …
Read full abstract