Abstract
Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. Here, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the new architecture outperforms the baseline linear stimulus-reconstruction method, improving decoding accuracy from 66% to 81% using wet EEG and from 59% to 87% for dry EEG. Also of note was the finding that the dry EEG system can deliver comparable or even better results than the wet, despite the latter having one third as many EEG channels as the former. The 11-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available for further validation, experimentation, and modification.
Highlights
Our results indicate that this new architecture outperforms the traditional stimulus-reconstruction decoders by a significant margin on both datasets
We see a temporal response function (TRF) peak occurs at 200 ms in the center of the head and dissipates afterwards
The deep neural network (DNN) classifier approach dramatically outperformed the traditional segregated architecture in decoding accuracy (81% wet, 87% dry) with a performance advantage in all of the dry EEG cases and all but two of the wet EEG cases, and shows a smaller variance among the subjects
Summary
Gregory Ciccarelli 1, Michael Nolan[1], Joseph Perricone[1], Paul T. The attention decision typically is between two simultaneous, spatially separated talkers This approach has been modified to evaluate: sensitivity to number of EEG channels and size of training data[14]; robustness to noisy reference stimuli[15,16]; the use of auditory-inspired stimulus pre-processing including subband envelopes with amplitude compression[17]; cepstral processing of EEG and speech signals for improved correlations[25]; the effects of speaker (spatial) separation and additional speech-like background noise[18]; the effects of (simulated) reverberation[19]; and potential performance improvements through various regularization methods[20]. Our results indicate that this new architecture outperforms the traditional stimulus-reconstruction decoders by a significant margin on both datasets
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.