Abstract

SpectraI analysis of the Japanese vowels shows that the five vowels /a/, /e/, /i/, /o/, and /u/ of a single speaker can well be separated by their first and second formant frequencies (F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> and F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> ). Considerable amount of overlap is observed, however, when vowels of many speakers are plotted in the F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> -F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> plane, which can be ascribed mainly to differences in the size and shape of the vocal tract. A normalizing process, based presumably on higher formant frequencies, is expected in the identification of these vowels. It is not dear, however, whether concurrent changes of pitch and higher formants are necessary in the normalization process. This paper presents a method for evaluating the roles of these parameters and describes the results obtained. Perceptual boundaries between a pair of vowels, which share approximately the same ratio of F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> to F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> , are defined in the F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> plane, using synthetic vowels generated by a terminal analog synthesizer. The importance of pitch and higher formants, is then evaluated by the extent to which their changes affect these boundaries. The results of listening tests show that, for ordinary buzz-excited vowels, neither pitch nor higher formants alone are sufficient for perceptual normalization, and the combined changes in pitch and higher formants are necessary to counteract the changes in F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> and F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> . For noise-excited vowels, on the other hand, the roles of higher formants are as important as the combined roles of pitch and higher formants in buzz-excited vowels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call