Abstract
Humans have the ability to pay attention to one of the sound sources in a multispeaker acoustic environment. Auditory attention detection (AAD) seeks to detect the attended speaker from one’s brain signals that will enable many innovative human–machine systems. However, effective representation learning of electroencephalography (EEG) signals remains a challenge. In this article, we propose a neural attention mechanism that dynamically assigns differentiated weights to the subbands and the channels of EEG signals to derive discriminative representations for AAD. In the nutshell, we would like to build a computational attention mechanism, i.e., neural attention, to model the auditory attention in human brain. We incorporate the proposed neural attention into an AAD system, and validate the neural attention mechanism through comprehensive experiments on two publicly available datasets. The experimental results demonstrate that the proposed system significantly outperforms the state-of-the-art reference baselines.
Highlights
H UMANS have the ability to focus their auditory attention on one speaker, and ignore other sound sources in a Manuscript received March 8, 2021; revised June 1, 2021, August 13, 2021, September 22, 2021, and October 20, 2021; accepted October 23, 2021
Unlike the traditional channel selection, we propose a soft channel attention mechanism, which seeks to capture the interchannel relationship of EEG signals and adaptively assign differentiated weights to individual channels according to the EEG signals and the speech envelopes
With 2-s decision window, convolutional neural network (CNN) model obtains an average accuracy of 79.6% (SD: 11.67) and CNN with frequency attention (CNN-F) model gains an improvement of 4.1%
Summary
H UMANS have the ability to focus their auditory attention on one speaker, and ignore other sound sources in a Manuscript received March 8, 2021; revised June 1, 2021, August 13, 2021, September 22, 2021, and October 20, 2021; accepted October 23, 2021. Same idea, convolutional neural network (CNN) [12], [25], [26] was studied by directly relating both the raw EEG signals and the speech stimulus to the attention detection decision, without reconstructing the auditory stimulus. Let us call this an endto-end classification approach. The contributions of EEG channels and frequency subbands to AAD performance may vary over time This prompts us to study a nonlinear, dynamic weighting mechanism that is known as the neural attention mechanism in deep neural networks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have