Abstract

The production of some speech sounds involves the periodic modulation of a noise component (e.g., the voiced fricatives in French, or aspiration noises in breathy vowels). According to the linear acoustic model of speech production, the speech signal is produced by filtering an excitation signal with a linear time-varying filter (which represent the vocal tract transfer function and sound radiation). For instance, in voiced fricatives, the excitation signal is a mixed signal resulting from the modulation of a frication noise source by the glottal flow. Two representations, taking into account the nonstationary nature of mixed-excitation speech sounds, were studied: cyclostationary analysis (using a spectral correlation (SC) estimator of the cyclic frequency-frequency spectrum) and nonstationary analysis [using a smoothed pseudo Wigner–Ville (WV) estimator of the Wigner–Ville spectrum]. The theoretical and experimental results obtained on test signals and actual speech show that some acoustic parameters can be estimated using these analysis methods (frequency of modulation using SC; time structure of the modulation spectrum and vocal tract filter transfer function using WV). Nevertheless, estimation of other acoustic parameters (for instance the spectral density of the excitation noise) appeared difficult, if not impossible.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.