Abstract

In the present study, we investigated the multisensory gain as the difference of speech recognition accuracies between the audio–visual (AV) and auditory-only (A) conditions, and the multisensory gain as the difference between the event-related potentials (ERPs) evoked under the AV condition and the sum of the ERPs evoked under the A and visual-only (V) conditions in different noise environments. Videos of a female speaker articulating the Chinese monosyllable words accompanied with different levels of pink noise were used as the stimulus materials. The selected signal-to-noise ratios (SNRs) were −16, −12, −8, −4 and 0 dB. Under the A, V and AV conditions the accuracy of the speech recognition was measured and the ERPs evoked under different conditions were analyzed, respectively. The behavioral results showed that the maximum gain as the difference of speech recognition accuracies between the AV and A conditions was at the −12 dB SNR. The ERP results showed that the multisensory gain as the difference between the ERPs evoked under the AV condition and the sum of ERPs evoked under the A and V conditions at the −12 dB SNR was significantly higher than those at the other SNRs in the time window of 130–200 ms in the area from frontal to central region. The multisensory gains in audio–visual speech recognition at different SNRs were not completely accordant with the principle of inverse effectiveness, but confirmed to cross-modal stochastic resonance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.