This research provides a multidisciplinary perspective on the factors influencing the process of integrating auditory and visual information in speech perception, and the nature of the mental representations to which speech input is matched for identification. Most previous studies of L2 speech perception had focused on only one source of input—the auditory modality. In this study, however, the occurrence of the McGurk effect (the influence of visual or lip‐read information on speech perception) pointed to the power of visible speech in face‐to‐face communication for L2 learners as well as native speakers. Participants were advanced ESL learners of 4 L1s (Japanese, Korean, Spanish, and Malay) and native speakers of English. In this experimental design, six CV syllables with /p, f, w, r, t, k/ were combined on videotape to create concordant and discordant conditions (i.e., where audiovisual cues were matched and mismatched). This design permits the tagging of input according to its source in order to evaluate the relative contributions of each modality to the perceptual outcome. Results indicated that the learners' L1s influenced the relative information value of the auditory and visual cues, and thus their contribution to a single percept. The Japanese and Korean learners' identification accuracy of /f/ and /r/ increased with matched visual cues. It was evident that learners attended to visual cues to identify speech sounds even though they had not received any explicit training.By drawing from the literature on L2 phonology, speechreading, speech perception, attention, similarity, and categorization, this work demonstrates the need for a model of L2 speech acquisition within a cognitive framework to account for bimodal input in the broader context of spoken language processing. Findings also point to the need to incorporate into perceptual training for L2 learners the enhancement of the information value of visual cues as a second channel of input.
Read full abstract