Acoustic source characteristics, across-formant integration, and speech intelligibility under competitive conditions.

Brian Roberts,Peter J Bailey,Robert J Summers

doi:10.1037/xhp0000038

Abstract

An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition.

Highlights

An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation
Sentence intelligibility is typically reduced when the target speech is accompanied by a competitor formant (F2C) created using the time-reversed frequency and amplitude contours of F2
The dichotic configuration used for stimulus presentation limits the energetic masking effects of F2C, because the F1 of the target sentence was lower in frequency and more intense than F2C, and the target F2 was presented in the opposite ear

Summary

Introduction

An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. The current study explores the effect of whether or not: (a) all formant analogues comprising the target utterance are synthesized using the same source characteristics; (b) analogues of extraneous formants are present. These two factors are likely to interact, as it is well-established that the factors governing perceptual organization are generally revealed most clearly where competition operates (e.g., Barker & Cooke, 1999; Darwin, 1981). To our knowledge, no studies have used sentence-length materials of this type or compared the effect of an across-formant mismatch in source characteristics in target-only and target-plus-interferer contexts

Methods

Results

Conclusion