Abstract

In steady-state voiced speech production, the vocal tract transfer function is sampled at multiples of the fundamental frequency (F0). At high F0, sparse sampling causes two problems: (a) a gradual loss of information defining the spectral shape, and (b) F0-dependent distortion due to aliasing. If the shape of the spectral envelope contains lag-domain components (spatial frequencies) beyond the Nyquist limit, they are folded relative to the Nyquist limit and mixed with in-band components. The Nyquist limit (T0/2, where T0=1/F0) depends on the spacing between sampling points along the spectral envelope, and thus on F0. Distortion is thus F0-dependent, and all the more severe as F0 is high. Smoothing or interpolation are ineffective in dealing with this problem, and cannot produce an F0-invariant pattern. A solution is proposed based on the concept of ‘‘missing feature theory,’’ recently proposed for speech recognition. Pattern matching is limited to available data (F0-spaced sample points) using an F0-dependent weighting function. All other points are ignored. The model is proposed in two versions, one that operates on the short-term spectrum or excitation pattern, the other on the autocorrelation function. The model ensures F0-independent vowel identification, at the cost of an F0 estimate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call