Perceptuele Strukturen van Synthetische en Natuurlijke Klinkers

Georges Govaerts

doi:10.5334/pb.623

Abstract

[Perceptual Structures of Synthetic and Natural Vowels] The investigation which is reported here, forms part of a study which concentrates on the relation between acoustical dimensions of speech sounds on the one hand and some response dimensions on the other hand. The research is aimed at gaining more insight into the perceptual organization of vowels. In order to attain different levels or aspects of the perceptual organization, various experiments were designed in which the task of the subjects, in relation to the same stimulus materials, was very different. This report deals with the experiments concerning similarity judgements on pairs of human vowels and pairs of synthetic vowels. The synthetic vowels are generated with a five-formant parallel hardware synthesizer. The results from previous experiments on the naturalness (Fig. 1) and the identification-value (Tab. 3) of both kinds of vowels suggest that the selected physical parameters (Tab. 1), measured on the human vowels, allow the generation of synthetic vowels which match the natural ones This heigthens the external validity of our results. A multidimensional analysis on the pooled similarity data resulted for both kinds of vowels in a predicted structure (Fig. 3) with two qualitative dimensions. The first dimension reflects the distinctive features acute vs. grave and in there a tense/lax distinction. The second dimension reflects the distinctive features plain vs. flat and in there the compact/ diffuse contrast. A multidimensional analysis which lakes into account inter-individual differences in perception resulted for both kinds of vowels in a six-dimensional structure (Fig 4). The first, qualitative, dimension reflected the subjective importance of low versus high frequency information plus the position on the frequency scale (multiple r of D1 with the acute/grave and the plain/flat indices = .934). The other dimensions were related to the bandwith of the formants (r of D2 with ∑log10Bi = .970), the central frequency of the second formant (r of D3 with log10CF2 = .959). the central frequency of the first formant (r of D4 with log10CF1 = .933), the duration of the vowels (r of D5 with time in msec = .894) and the loudness (r of D6 with loudness indices = .839).