Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model.

Ivine Kuruvila,Jan Muncke,Eghart Fischer,Ulrich Hoppe

doi:10.3389/fphys.2021.700655

Abstract

Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)—long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Physiology	Publication Date: Aug 2, 2021
Citations: 19	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Physiology

Lead the way for us

Similar Papers

Machine-learning-based model and simulation analysis of PM2.5 concentration prediction in Beijing
...
工程科学学报 | VOL. 41
, et. al. ...
20 Mar 2019
工程科学学报 | VOL. 41

Emergency sign language recognition from variant of convolutional neural network (CNN) and long short term memory (LSTM) models
Muhammad Amir As'Ari ... Guat Si Qi
International Journal of Advances in Intelligent Informatics | VOL. 10
Muhammad Amir As'Ari, et. al.Muhammad Amir As'Ari ... Guat Si Qi
29 Feb 2024
International Journal of Advances in Intelligent Informatics | VOL. 10

Gujarati Task Oriented Dialogue Slot Tagging Using Deep Neural Network Models
Rachana Parikh ... Hiren Joshi
-
Rachana Parikh, et. al.Rachana Parikh ... Hiren Joshi
01 Jan 2020
01 Jan 2020

Auditory attention decoding from electroencephalography based on long short-term memory networks
Yun Lu ... Shixiong Chen
Biomedical Signal Processing and Control | VOL. 70
Yun Lu, et. al.Yun Lu ... Shixiong Chen
15 Jul 2021
Biomedical Signal Processing and Control | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extracting the Auditory Attention in a Dual-Speaker Scenario From EEG Using a Joint CNN-LSTM Model.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Physiology