Formant estimation of high fundamental frequency speech

Corine Bickley

doi:10.1121/1.2023204

Abstract

Formant measurement procedures often rely on there being a low fundamental frequency. An early study [B. Lindblom, International Congress of Phonetic Sciences, 4th, Helsingfors, 1961, 189–202 (1962)] found that the mean error in formant estimation ranged from about 40 Hz to a frequency of one‐fourth the fundamental. This study compares signal processing techniques for the estimation of formant frequencies and bandwidths of synthesized and natural speech characterized by a high fundamental frequency. Utterances were synthesized [D. H. Klatt, J. Acoust. Soc. Am. 67, 971–995 (1980)] using young children's utterances as models. The spectral and durational characteristics were matched closely by manipulating the synthesizer parameters. Spectrograms, discrete Fourier transforms, linear prediction envelopes, and auditory pseudospectrograms were computed for both the synthesized and natural utterances. The accuracy of formant estimation was judged by comparing the values determined by each of these methods to the known frequencies and bandwidths of the synthesized speech. Implications for formant estimation of natural speech will be discussed. [Work supported in part by a Whitaker Health Sciences Fellowship.]

Full Text