Abstract
Lower modulation rates in the temporal envelope (ENV) of the acoustic signal are believed to be the rhythmic backbone in speech, facilitating speech comprehension in terms of neuronal entrainments at δ- and θ-rates (these rates are comparable to the foot- and syllable-rates phonetically). The jaw plays the role of a carrier articulator regulating mouth opening in a quasi-cyclical way, which correspond to the low-frequency modulations as a physical consequence. This paper describes a method to examine the joint roles of jaw oscillation and ENV in realizing speech rhythm using spectral coherence. Relative powers in the frequency bands corresponding to the δ-and θ-oscillations in the coherence (respectively notated as %δ and %θ) were quantified as one possible way of revealing the amount of concomitant foot- and syllable-level rhythmicities carried by both acoustic and articulatory domains. Two English corpora (mngu0 and MOCHA-TIMIT) were used for the proof of concept. %δ and %θ were regressed on utterance duration for an initial analysis. Results showed that the degrees of foot- and syllable-sized rhythmicities are different and are contingent upon the utterance length.
Highlights
This paper characterizes speech rhythm in terms of the spectral coherence between jaw oscillations and speech temporal envelopes (ENV, )
Lei He, Yu Zhang reports an initial analysis on the relationships between relative powers of the δ- and θ-bands in their coherence and utterance length using two English corpora: mngu0 (Richmond, Hoole, & King, 2011) and MOCHA-TIMIT (Wrench, 1999)
This paper introduced a method to characterize speech rhythm using spectral coherence between jaw oscillation and the speech ENV, i.e. the jaw-env coherence
Summary
This paper characterizes speech rhythm in terms of the spectral coherence between jaw oscillations and speech temporal envelopes (ENV, ). Two frequency bands in the coherence spectrum covering the neuronal δ- and θ-rates were analyzed in terms of their relative contributions to the entire coherence power These bands have been claimed to correspond to the foot- and syllable-timescales in speech and have been demonstrated to play a crucial role in neurological speech processing via brainwave-to-ENV entrainment Coupling between jaw cycles and vocalization arose in the course of human evolution: the sonority of speech typically waxes and wanes with mouth opening and closing gestures (Ghazanfar et al, 2010; MacNeilage, 1998; Morrill, Paukner, Ferrari, & Ghazanfar, 2012) Such opening-closing alternations are temporally organized into syllable-sized units corresponding to the ENV modulations, which constitute the rhythmic “frames”; the open and closed phases are filled with vocalic and consonantal “contents” — the frame/content theory of speech evolution (MacNeilage, 1998). Jaw oscillation and ENV (in reference to Parseval’s theorem of energy conservation)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.