Abstract

The attended speech stream can be detected robustly, even in adverse auditory scenarios with auditory attentional modulation, and can be decoded using electroencephalographic (EEG) data. Speech segmentation based on the relative root-mean-square (RMS) intensity can be used to estimate segmental contributions to perception in noisy conditions. High-RMS-level segments contain crucial information for speech perception. Hence, this study aimed to investigate the effect of high-RMS-level speech segments on auditory attention decoding performance under various signal-to-noise ratio (SNR) conditions. Scalp EEG signals were recorded when subjects listened to the attended speech stream in the mixed speech narrated concurrently by two Mandarin speakers. The temporal response function was used to identify the attended speech from EEG responses of tracking to the temporal envelopes of intact speech and high-RMS-level speech segments alone, respectively. Auditory decoding performance was then analyzed under various SNR conditions by comparing EEG correlations to the attended and ignored speech streams. The accuracy of auditory attention decoding based on the temporal envelope with high-RMS-level speech segments was not inferior to that based on the temporal envelope of intact speech. Cortical activity correlated more strongly with attended than with ignored speech under different SNR conditions. These results suggest that EEG recordings corresponding to high-RMS-level speech segments carry crucial information for the identification and tracking of attended speech in the presence of background noise. This study also showed that with the modulation of auditory attention, attended speech can be decoded more robustly from neural activity than from behavioral measures under a wide range of SNR.

Highlights

  • The human auditory system enables listeners to follow attended speakers and filter out background noises effortlessly, known as the “cocktail party” effect (Cherry, 1953)

  • The results showed that: (1) the neural tracking activities to intact and high-RMSlevel segments have the same characteristics in topological and morphological distributions, and the temporal response function (TRF) responses with highRMS-level segments showed only weaker magnitudes than that with intact speech envelopes; (2) the speech temporal envelope of high-RMS-level segments could be used to decode auditory attention reliably, with no significant difference in the strength of cortical selectivity from the temporal envelope of intact speech; (3) lower signal-tonoise ratio (SNR) were associated with worse neural tracking of speech, whereas the accuracy of attended speech selection was insensitive to the level of background noise

  • Neural tracking activities reflected by TRF amplitudes were worse for high-RMS-level segments than for the intact temporal envelope, indicating that each segment of the speech temporal envelope contributes to the cortical representation of attended speech

Read more

Summary

Introduction

The human auditory system enables listeners to follow attended speakers and filter out background noises effortlessly, known as the “cocktail party” effect (Cherry, 1953). Some researchers have investigated speech signal processing methods via the examination of neural responses to facilitate the attended speech recognition of hearing assistance devices in complex auditory scenes (e.g., Christensen et al, 2018; Miran et al, 2018; Somers et al, 2019). The optimal parameters of speech recognition algorithms could be determined by individual neural responses in auditory central pathways (Loeb and Kessler, 1995). As listeners’ intentions could be detected without verbal feedback (Miran et al, 2018), the incorporation of neural feedback into some speech-processing algorithms and its application in hearing prostheses (e.g., hearing aids and cochlear implants) have been considered to be effective approaches for improvement of the hearing ability of listeners with communication impairments (e.g., Mc Laughlin et al, 2012; Aroudi et al, 2019)

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call