Comparison of Two-Talker Attention Decoding from EEG with Nonlinear Neural Networks and Linear Methods

Gregory Ciccarelli,Christopher J Smalt,James O’Sullivan,Joseph Perricone,Michael Nolan,Stephanie Haro,Thomas F Quatieri,Paul T Calamia,Nima Mesgarani

doi:10.1038/s41598-019-47795-0

Gregory Ciccarelli, Christopher J Smalt + Show 7 more

Open Access

https://doi.org/10.1038/s41598-019-47795-0

Copy DOI

Abstract

Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. Here, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the new architecture outperforms the baseline linear stimulus-reconstruction method, improving decoding accuracy from 66% to 81% using wet EEG and from 59% to 87% for dry EEG. Also of note was the finding that the dry EEG system can deliver comparable or even better results than the wet, despite the latter having one third as many EEG channels as the former. The 11-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available for further validation, experimentation, and modification.

Highlights

Our results indicate that this new architecture outperforms the traditional stimulus-reconstruction decoders by a significant margin on both datasets
We see a temporal response function (TRF) peak occurs at 200 ms in the center of the head and dissipates afterwards
The deep neural network (DNN) classifier approach dramatically outperformed the traditional segregated architecture in decoding accuracy (81% wet, 87% dry) with a performance advantage in all of the dry EEG cases and all but two of the wet EEG cases, and shows a smaller variance among the subjects

Summary

Linear Methods

Gregory Ciccarelli 1, Michael Nolan[1], Joseph Perricone[1], Paul T. The attention decision typically is between two simultaneous, spatially separated talkers This approach has been modified to evaluate: sensitivity to number of EEG channels and size of training data[14]; robustness to noisy reference stimuli[15,16]; the use of auditory-inspired stimulus pre-processing including subband envelopes with amplitude compression[17]; cepstral processing of EEG and speech signals for improved correlations[25]; the effects of speaker (spatial) separation and additional speech-like background noise[18]; the effects of (simulated) reverberation[19]; and potential performance improvements through various regularization methods[20]. Our results indicate that this new architecture outperforms the traditional stimulus-reconstruction decoders by a significant margin on both datasets

Methods

Results

Discussion