Speech analyzer using a joint estimation model of spectral envelope and fine structure

Hirokazu Kameoka,Nobutaka Ono,Shigeki Sagayama,Jonathan Le Roux

doi:10.21437/interspeech.2006-627

Abstract

We have been working on a new speech analyzer based on a parametric representation of speech governed by the F0 parameter, towards practical human-machine interfaces. As a precise estimation of the frequency response of the vocal tract from a real speech signal requires the power of each component of the harmonic structure to be accurately estimated, one hopes to have a high-precision estimation of F0. At the same time, under the empirical constraint that speech spectral envelopes are usually smooth in the power domain, half pitch errors can be significantly avoided. Therefore, F0 and the envelope should be estimated jointly rather than separately through an optimal estimation of the spectral envelope and the spectral fine structure. In this article, we introduce a new speech analysis method using a spectral model with a composite function of envelope and fine structure models. Index Terms: parametric speech analyzer, speech synthesis, pitch estimation, spectral envelope estimation.

Full Text