Abstract

Research has shown that visual speech perception can assist accuracy in identification of spoken words. However, little is known about the dynamics of the processing mechanisms involved in audiovisual integration. In particular, architecture and capacity, measured using response time methodologies, have not been investigated. An issue related to architecture concerns whether the auditory and visual sources of the speech signal are integrated “early” or “late.” We propose that “early” integration most naturally corresponds to coactive processing whereas “late” integration corresponds to separate decisions parallel processing. We implemented the double factorial paradigm in two studies. First, we carried out a pilot study using a two-alternative forced-choice discrimination task to assess architecture, decision rule, and provide a preliminary assessment of capacity (integration efficiency). Next, Experiment 1 was designed to specifically assess audiovisual integration efficiency in an ecologically valid way by including lower auditory S/N ratios and a larger response set size. Results from the pilot study support a separate decisions parallel, late integration model. Results from both studies showed that capacity was severely limited for high auditory signal-to-noise ratios. However, Experiment 1 demonstrated that capacity improved as the auditory signal became more degraded. This evidence strongly suggests that integration efficiency is vitally affected by the S/N ratio.

Highlights

  • When someone utilizes lip-reading to take advantage of both auditory and visual modalities, how is this accomplished? Research shows that even normal-hearing individuals benefit in accuracy from bimodal information in low-to-moderate signal-tonoise ratio conditions (e.g., Sumby and Pollack, 1954)

  • Data from three experimental conditions consisting of an auditory signal-to-noise ratio of −18 dB, S/N of −12 dB, and the clear auditory S/N ratio with a low noise auditory signal approximating optimal listening conditions are presented in Table 2 respectively

  • The authors’ methodology did not assay issues related to parallel versus coactive processing, their findings appear to be consistent with our data supporting an interactive parallel account of audiovisual integration

Read more

Summary

Introduction

When someone utilizes lip-reading to take advantage of both auditory and visual modalities, how is this accomplished? Research shows that even normal-hearing individuals benefit in accuracy from bimodal information in low-to-moderate signal-tonoise ratio conditions (e.g., Sumby and Pollack, 1954). The real-time processing mechanisms of lip-reading and how they relate to auditory word perception remain opaque (see Jesse and Massaro, 2010, for a recent study using the gating paradigm). A problem of interest that will be described concerns whether the information from the auditory and visual modalities is combined or “integrated” in the early stages of processing, or rather in later stages after phoneme, syllable, or even word recognition. These issues are important in building a theory of multimodal speech perception since they must be specified in any real-time processing system.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call