Auditory Attention Detection via Cross-Modal Attention.

Siqi Cai,Peiwen Li,Longhan Xie,Enze Su

doi:10.3389/fnins.2021.652058

Siqi Cai, Peiwen Li + Show 2 more

Open Access

https://doi.org/10.3389/fnins.2021.652058

Copy DOI

Abstract

Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.

Highlights

Humans have the ability to pay selective attention to one speaker in a multispeaker environment, called the “cocktail party scenario” (Cherry, 1953; Haykin and Chen, 2005)
We systematically investigated the effectiveness of cross-modal attention-based auditory attention detection (AAD)
We note that the convolutional neural network (CNN) model in our study focused on processing the CSPenhanced EEG data

Summary

Introduction

Humans have the ability to pay selective attention to one speaker in a multispeaker environment, called the “cocktail party scenario” (Cherry, 1953; Haykin and Chen, 2005). Existing approaches usually fail in the cocktail-party situation and many hearing aid users complain about the difficulty of following a target speaker in the presence of noisy and other competing speech streams (Chung, 2004). Recent developments in the field of neuroscience have shown that it is possible to decode the auditory attention in a multi-talker environment from brain signals (Ding and Simon, 2012; Mesgarani and Chang, 2012). This is known as auditory attention detection (AAD). The development of AAD opens up new opportunities to the cognitive control of auditory prostheses, such as hearing aids and cochlear implants

Methods

Results

Discussion

Conclusion