Abstract

Speech glottal flow has been predominantly described in the time-domain in past decades, the Liljencrants-Fant (LF) model being the most widely used in speech analysis and synthesis, despite its computational complexity. The causal/anti-causal linear model (LFCALM) was later introduced as a digital filter implementation of LF, a mixed-phase spectral model including both anti-causal and causal filters to model the vocal-fold open and closed phases, respectively. To further simplify computation, a causal linear model (LFLM) describes the glottal flow with a fully causal set of filters. After expressing these three models under a single analytic formulation, we assessed here their perceptual consistency, when driven by a single parameter Rd related to voice quality. All possible paired combinations of signals generated using six Rd levels for each model were presented to subjects who were asked whether the two signals in each pair differed. Model pairs LFLM-LFCALM were judged similar when sharing the same Rd value, and LF was considered the same as LFLM and LFCALM given a consistent shift in Rd. Overall, the similarity between these models encourages the use of the simpler and more computationally efficient models LFCALM and LFLM in speech synthesis applications.

Highlights

  • Of speech by Markel and Gray (1982)

  • Speech glottal flow has been predominantly described in the time-domain in past decades, the Liljencrants–Fant (LF) model being the most widely used in speech analysis and synthesis, despite its computational complexity

  • This has led to the proposition of a multiplicity of glottal flow models (GFMs) defined in the time-domain by analytic and parametric formulations of the glottal flow waveform and its derivative: Rosenberg (1971) (Rosenberg model); Hedelin (1984), Fujisaki and Ljungqvist (1986), and Klatt and Klatt (1990) (KLGLOTT88 model); Fant et al (1985) [Liljencrants–Fant (LF) model]; Veldhuis (1998) (Rþþ model)

Read more

Summary

INTRODUCTION

Glottal filter impulse responses poorly match glottal flow waveforms obtained by inverse filtering or by indirect measurements like electroglottography This has led to the proposition of a multiplicity of glottal flow models (GFMs) defined in the time-domain by analytic and parametric formulations of the glottal flow waveform and its derivative: Rosenberg (1971) (Rosenberg model); Hedelin (1984), Fujisaki and Ljungqvist (1986), and Klatt and Klatt (1990) (KLGLOTT88 model); Fant et al (1985) [Liljencrants–Fant (LF) model]; Veldhuis (1998) (Rþþ model). It is interesting to investigate the apparent discrepancy between GFM like LF and filter impulse-response models Along this line, Doval et al (2006) highlighted that LF and the other time-domain models under study have a simple magnitude representation in the frequency-domain that can be modelled with a third order filter, as noted by Childers and Lee (1991). IV summarises the results obtained: linear-filter formulations equivalent to the LF model are able to account for both the observed glottal formant and glottal flow waveforms

Glottal flow model parameters
General formulation of the open phase
Comparison between the GFM open phases
Formulation of the closed phases
Comparison between the GFM closed phases
Assessment of computational costs
Summary of the model implementation and effect of Rd
Protocol and task
Stimuli specification
Results
Effect of Rd and order
Effect of model
Effect of vowel
Remaining interactions
Findings
DISCUSSION AND CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call