Characterizing speech rhythm using spectral coherence between jaw displacement and speech temporal envelope

Lei He,Yu Zhang

doi:10.3989/loquens.2020.074

Abstract

Lower modulation rates in the temporal envelope (ENV) of the acoustic signal are believed to be the rhythmic backbone in speech, facilitating speech comprehension in terms of neuronal entrainments at δ- and θ-rates (these rates are comparable to the foot- and syllable-rates phonetically). The jaw plays the role of a carrier articulator regulating mouth opening in a quasi-cyclical way, which correspond to the low-frequency modulations as a physical consequence. This paper describes a method to examine the joint roles of jaw oscillation and ENV in realizing speech rhythm using spectral coherence. Relative powers in the frequency bands corresponding to the δ-and θ-oscillations in the coherence (respectively notated as %δ and %θ) were quantified as one possible way of revealing the amount of concomitant foot- and syllable-level rhythmicities carried by both acoustic and articulatory domains. Two English corpora (mngu0 and MOCHA-TIMIT) were used for the proof of concept. %δ and %θ were regressed on utterance duration for an initial analysis. Results showed that the degrees of foot- and syllable-sized rhythmicities are different and are contingent upon the utterance length.

Highlights

This paper characterizes speech rhythm in terms of the spectral coherence between jaw oscillations and speech temporal envelopes (ENV, )
Lei He, Yu Zhang reports an initial analysis on the relationships between relative powers of the δ- and θ-bands in their coherence and utterance length using two English corpora: mngu0 (Richmond, Hoole, & King, 2011) and MOCHA-TIMIT (Wrench, 1999)
This paper introduced a method to characterize speech rhythm using spectral coherence between jaw oscillation and the speech ENV, i.e. the jaw-env coherence

Summary

INTRODUCTION

This paper characterizes speech rhythm in terms of the spectral coherence between jaw oscillations and speech temporal envelopes (ENV, ). Two frequency bands in the coherence spectrum covering the neuronal δ- and θ-rates were analyzed in terms of their relative contributions to the entire coherence power These bands have been claimed to correspond to the foot- and syllable-timescales in speech and have been demonstrated to play a crucial role in neurological speech processing via brainwave-to-ENV entrainment Coupling between jaw cycles and vocalization arose in the course of human evolution: the sonority of speech typically waxes and wanes with mouth opening and closing gestures (Ghazanfar et al, 2010; MacNeilage, 1998; Morrill, Paukner, Ferrari, & Ghazanfar, 2012) Such opening-closing alternations are temporally organized into syllable-sized units corresponding to the ENV modulations, which constitute the rhythmic “frames”; the open and closed phases are filled with vocalic and consonantal “contents” — the frame/content theory of speech evolution (MacNeilage, 1998). Jaw oscillation and ENV (in reference to Parseval’s theorem of energy conservation)

The corpora

Calculating JAW-ENV coherence

DATA ANALYSES AND RESULTS7

DISCUSSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Loquens	Publication Date: Dec 30, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

Characterizing speech rhythm using spectral coherence between jaw displacement and speech temporal envelope

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Loquens

Lead the way for us

Similar Papers

The role of phase-locking to the temporal envelope of speech in auditory perception and speech intelligibility.
Rebecca E Millman ... Sam R Johnson
Journal of Cognitive Neuroscience | VOL. 27
Rebecca E Millman, et. al.Rebecca E Millman ... Sam R Johnson
01 Mar 2015
Journal of Cognitive Neuroscience | VOL. 27

Evaluation of phase-locking to parameterized speech envelopes.
Wouter David ... Jan Wouters
Frontiers in neurology | VOL. 13
Wouter David, et. al.Wouter David ... Jan Wouters
03 Aug 2022
Frontiers in neurology | VOL. 13

Objective speech intelligibility prediction using a deep learning model with continuous speech-evoked cortical auditory responses.
Youngmin Na ... Le Thi Trang
Frontiers in Neuroscience | VOL. 16
Youngmin Na, et. al.Youngmin Na ... Le Thi Trang
18 Aug 2022
Frontiers in Neuroscience | VOL. 16

Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening.
J Swaminathan ... G Kidd
The Journal of neuroscience : the official journal of the Society for Neuroscience | VOL. 36
J Swaminathan, et. al.J Swaminathan ... G Kidd
03 Aug 2016
The Journal of neuroscience : the official journal of the Society for Neuroscience | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Characterizing speech rhythm using spectral coherence between jaw displacement and speech temporal envelope

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Loquens