Abstract

Humans have the ability to pay attention to one of the sound sources in a multispeaker acoustic environment. Auditory attention detection (AAD) seeks to detect the attended speaker from one’s brain signals that will enable many innovative human–machine systems. However, effective representation learning of electroencephalography (EEG) signals remains a challenge. In this article, we propose a neural attention mechanism that dynamically assigns differentiated weights to the subbands and the channels of EEG signals to derive discriminative representations for AAD. In the nutshell, we would like to build a computational attention mechanism, i.e., neural attention, to model the auditory attention in human brain. We incorporate the proposed neural attention into an AAD system, and validate the neural attention mechanism through comprehensive experiments on two publicly available datasets. The experimental results demonstrate that the proposed system significantly outperforms the state-of-the-art reference baselines.

Highlights

  • H UMANS have the ability to focus their auditory attention on one speaker, and ignore other sound sources in a Manuscript received March 8, 2021; revised June 1, 2021, August 13, 2021, September 22, 2021, and October 20, 2021; accepted October 23, 2021

  • Unlike the traditional channel selection, we propose a soft channel attention mechanism, which seeks to capture the interchannel relationship of EEG signals and adaptively assign differentiated weights to individual channels according to the EEG signals and the speech envelopes

  • With 2-s decision window, convolutional neural network (CNN) model obtains an average accuracy of 79.6% (SD: 11.67) and CNN with frequency attention (CNN-F) model gains an improvement of 4.1%

Read more

Summary

INTRODUCTION

H UMANS have the ability to focus their auditory attention on one speaker, and ignore other sound sources in a Manuscript received March 8, 2021; revised June 1, 2021, August 13, 2021, September 22, 2021, and October 20, 2021; accepted October 23, 2021. Same idea, convolutional neural network (CNN) [12], [25], [26] was studied by directly relating both the raw EEG signals and the speech stimulus to the attention detection decision, without reconstructing the auditory stimulus. Let us call this an endto-end classification approach. The contributions of EEG channels and frequency subbands to AAD performance may vary over time This prompts us to study a nonlinear, dynamic weighting mechanism that is known as the neural attention mechanism in deep neural networks.

NEURAL ATTENTION FOR AAD
Frequency Attention
Channel Attention
Aligning Speech Envelopes With EEG Features
Back-End Classifier
AAD Dataset
Data Preparation
Training and Evaluation
EXPERIMENT RESULTS
Channel Attention With Broadband EEG
Frequency Attention With Multiband EEG
Frequency-Channel Attention With Multiband EEG
Experiments on DTU Dataset
Speech Envelopes as References
EMPIRICAL ANALYSIS OF NEURAL ATTENTION
Comparative Study
Analysis of Channel Attention Mask
Visualization of Frequency Attention Mask
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call