Abstract

The objective is to recover the vocal tract shape dynamics from the speech signal of vowels and fricatives. The method relies on the analysis-by-synthesis paradigm and is an extension of the method proposed by Ouni and Laprie which exploits a hypercubic articulatory table to represent the synthesis facet, i.e. Maeda's articulatory model. The first major modification is the use of parallelepiped instead of cubes. The new construction strategy only subdivides the articulatory space in the articulatory direction which gives rise to the strongest non-linearities. This enables a substantial reduction of the table size. The second major modification is the inversion of fricative sounds. In addition to the articulatory parameters the relative location of the noise source downstream the constriction is taken into account. This gives rise to three different articulatory codebooks, each corresponding to the relative position of the source with respect to the main constriction. This new inversion method has been evaluated on VCV sequences. It turns out that the correct vocal tract dynamics is recovered even if the constriction area is slightly underestimated. Input data are the first three formant frequencies but MFCC coefficients are now investigated since they render the high frequency region of fricative spectra better.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call