Language-General Auditory-Visual Speech Perception: Thai-English and Japanese-English McGurk Effects.

Denis Burnham,Barbara Dodd

doi:10.1163/22134808-00002590

Denis Burnham, Barbara Dodd

Open Access

https://doi.org/10.1163/22134808-00002590

Copy DOI

Abstract

Cross-language McGurk Effects are used to investigate the locus of auditory-visual speech integration. Experiment 1 uses the fact that [], as in 'sing', is phonotactically legal in word-final position in English and Thai, but in word-initial position only in Thai. English and Thai language participants were tested for 'n' perception from auditory [m]/visual [] (A[m]V[]) in word-initial and -final positions. Despite English speakers' native language bias to label word-initial [] as 'n', the incidence of 'n' percepts to A[m]V[] was equivalent for English and Thai speakers in final and initial positions. Experiment 2 used the facts that (i) [ð] as in 'that' is not present in Japanese, and (ii) English speakers respond more often with 'tha' than 'da' to A[ba]V[ga], but more often with 'di' than 'thi' to A[bi]V[gi]. English and three groups of Japanese language participants (Beginner, Intermediate, Advanced English knowledge) were presented with A[ba]V[ga] and A[bi]V[gi] by an English (Experiment 2a) or a Japanese (Experiment 2b) speaker. Despite Japanese participants' native language bias to perceive 'd' more often than 'th', the four groups showed a similar phonetic level effect of [a]/[i] vowel context× 'th' vs. 'd' responses to A[b]V[g] presentations. In Experiment 2b this phonetic level interaction held, but was more one-sided as very few 'th' responses were evident, even in Australian English participants. Results are discussed in terms of a phonetic plus postcategorical model, in which incoming auditory and visual information is integrated at a phonetic level, after which there are post-categorical phonemic influences.

Highlights

Visual information plays an important role in speech perception, in noisy but otherwise natural conditions (Sumby and Pollack, 1954), and in clear but unnatural conditions when auditory and visual speech components are mismatched (Dodd, 1977; McGurk and MacDonald, 1976)
These pioneering mismatching studies plus four decades of research on what is called the McGurk Effect show that visual speech information is combined with auditory speech information, albeit unconsciously, whenever it is available
The overall distribution of responses on such trials, the incidence of ‘n’ responses on such trials, and the reaction times for ‘n’ responses on such trials are reported in turn below

Summary

Introduction

Visual (face and lip) information plays an important role in speech perception, in noisy but otherwise natural conditions (Sumby and Pollack, 1954), and in clear but unnatural conditions when auditory and visual speech components are mismatched (Dodd, 1977; McGurk and MacDonald, 1976) These pioneering mismatching studies plus four decades of research on what is called the McGurk Effect show that visual speech information is combined with auditory speech information, albeit unconsciously, whenever it is available ( the nature of the combination may differ from more natural conditions; see Alsius et al in this issue). The issue of phonotactically illegal responses such as ‘bga’ is taken up further in the introduction to Experiment 1

Methods

Results

Conclusion