Articles published on Detection Of Auditory Attention
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
58 Search results
Sort by Recency
- Research Article
- 10.1016/j.patrec.2026.03.005
- May 1, 2026
- Pattern Recognition Letters
- Haoqi Hu + 3 more
The effect of speech representations on EEG-based auditory attention detection
- Research Article
- 10.1007/s10439-026-04068-y
- Mar 17, 2026
- Annals of biomedical engineering
- Richard Gall + 5 more
In the context of a multi-speaker "cocktail party" scenario where listeners selectively focus on specific speakers, human auditory attention networks have shown a strong correlation with Electroencephalography (EEG) measurements. However, current EEG-based auditory attention detection (AAD) methods, mostly using artificial neural networks (ANN), face limitations on edge computing platforms due to extended decision windows, high power consumption, and substantial memory requirements linked to multiple EEG channels. This paper introduces a novel hybrid convolutional-spiking neural network (CNN-SNN) architecture, inspired by the auditory cortex, combining EEG data with multi-speaker speech envelopes, enabling effective auditory attention decoding within 0.5-s timeframes. Our approach reduces EEG channels, minimizes computational operations, and quantizes weight parameters while maintaining high accuracy. We validate this approach on our dataset and compare it to state-of-the-art methods on a publicly available dataset. CNN-SNN demonstrates superior performance, achieving up to 10% increase in decoding accuracy, while using 87.5% fewer EEG channels and 75% smaller bit precision for weight quantization compared to existing methods. These results offer promise for edge computing applications, such as hearing aids, emphasizing short decision windows, minimal EEG channels, and strict power and memory constraints.
- Research Article
- 10.1007/s11571-025-10371-6
- Dec 1, 2025
- Cognitive neurodynamics
- Yuanlin Dong + 4 more
Humans demonstrate the ability to focus auditory attention in noisy environments, enabling them to concentrate on a specific speaker at a cocktail party. Neuroscientific research has shown that auditory attention itself is a dynamic brain activity that evolves over time, which has inspired studies on electroencephalography (EEG)-based auditory attention detection (AAD). This paper proposes a neural attention mechanism model named GSANet, which employs a self-attention mechanism to model the temporal dynamics of EEG signals while dynamically assigning weights to EEG channels through a graph attention mechanism. In brief, GSANet simulates the neural attention mechanisms of the human brain to extract discriminative representations from EEG signals for training high-performance classifiers. We conducted experiments on two public datasets, KUL and DTU, achieving overall decoding accuracies of 94.5% and 79.2%, respectively, under a 1-second decision window, significantly outperforming baseline models across all comparative conditions. The code of our proposed method will be available at: https://github.com/dalin6666/GSANet.
- Research Article
1
- 10.1038/s41598-025-22177-x
- Nov 3, 2025
- Scientific Reports
- Yahao Wen + 3 more
Humans exhibit a remarkable ability to selectively focus on auditory stimuli in multi-speaker environments, such as cocktail parties. The Auditory Attention Detection (AAD) method aims to identify the conversation that a listener is attending to through the analysis of neural signals, particularly utilizing electroencephalography (EEG) data. However, current methodologies in this domain encounter several significant limitations. While many existing AAD methods use additional information–like spatial or frequency features–to improve decoding accuracy, they often miss the relationships between signals from different EEG channels. To address these shortcomings, this paper introduces a novel hybrid channel attention network for AAD. Our approach is the first to integrate spatial-temporal filtering, dynamic multi-scale feature fusion, and efficient cross-channel attention into a single unified architecture, enabling it to capture complex neural patterns of attention that previous methods overlooked. Our proposed network first extracts spatial-temporal features from raw EEG signals employing a dedicated spatial-temporal feature extraction module. The extracted features are then processed by a module that combines information across different time scales and uses an attention mechanism to identify important relationships between EEG channels. Experimental results demonstrate that our network achieves superior classification performance compared to baseline methods, particularly under conditions with short decision windows. Notably, while maintaining exceptional accuracy, the proposed architecture significantly reduces model parameters.
- Research Article
- 10.1088/1742-6596/3147/1/012013
- Nov 1, 2025
- Journal of Physics: Conference Series
- Jiazhen Li + 6 more
Abstract Auditory Attention Detection (AAD) approach seeks to identify the target speaker in multi-talker environments by analyzing electroencephalography (EEG) signals, addressing the cocktail party problem. However, existing EEG-based AAD methods overlook the spatial correlation patterns across different ranges among EEG channels and the collaborative capture of instantaneous and long-range features in temporal dynamics, which limits their ability to extract fine spatiotemporal features of brain activities. To address these issues, this paper proposes the Enhanced Brain Feature Network (EBFNet), which consists of the Spatial Feature Extraction (SFE) module, Temporal Sequence Modeling (TSM) module, and Attention Temporal-Spatial Fusion (ATF) module. The SFE employs multi-scale convolutions to capture spatial patterns across EEG channels. The TSE models instantaneous and sustained temporal patterns by processing long-range dependencies through global multi-head attention and capturing short-range changes with local convolutions. The ATF optimizes spatiotemporal features by weighting key channel-time interactions via attention gating, enhancing the relevance for attention decoding. Experimental results on the KUL dataset show that EBFNet increases classification accuracy by 8.0%, 1.7%, and 1.5% for 0.1-second, 1-second, and 2-second decision windows, while reducing trainable parameters by 50%.
- Research Article
- 10.1109/embc58623.2025.11251825
- Jul 14, 2025
- Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
- Gabriel Ivucic + 3 more
Auditory attention detection (AAD) reveals listeners' attention to a speech stimulus based on their elicited electroencephalography (EEG) signals. We propose a geometric graph convolutional network (Geo-GCN) that uses the physical layout of EEG sensors to construct a distance-based adjacency matrix. This enables Geo-GCN to perform more biologically informed feature learning than standard GCNs. Using data from participants with normal hearing (NH) and hearing-impaired (HI), our method outperforms traditional GCNs. Geo-GCN also demonstrates lower performance variability among participants. Analysis of separate NH and HI groups shows consistent gains over standard GCN, underlining the benefit of explicit modeling of scalp geometry. These findings highlight the potential of geometry-aware graph neural networks to improve EEG-based auditory attention detection, particularly in heterogeneous populations with varied hearing capabilities.
- Research Article
- 10.1109/embc58623.2025.11252602
- Jul 1, 2025
- Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
- Yuting Ding + 2 more
Electroencephalography (EEG)-based auditory attention detection (AAD) plays a crucial role in recent auditory brain-computer interface applications. However, the performance of AAD models in cross-subject tasks tends to be significantly degraded due to the excessive differences in EEG features across subjects. To address this challenge, we proposed a novel framework, AAD-ContrastNet, that incorporated contrastive learning to refine the temporal features from EEG and reduce the variance of EEG features across subjects. AAD-ContrastNet consists of four main components: (a) an attention-based EEG encoder; (b) a contrastive-learning-based EEG encoder; (c) a feature refinement module; and (d) a classifier. T-SNE visualization results show that combining contrastive learning with cross-attention feature refinement significantly improves the generalization of extracted EEG features. By comparing with SOTA models (i.e., DenseNet-3D and DARNet), we validate the significant effect of AAD-ContrastNet in improving cross-subject decoding accuracy, highlighting its potential in enhancing the robustness and generalization of EEG-based AAD systems.Clinical Relevance- This study demonstrates the potential of contrastive learning in mitigating cross-subject performance degradation, providing a solid foundation for applying generalized auditory brain-computer interface systems.
- Research Article
- 10.1109/embc58623.2025.11252872
- Jul 1, 2025
- Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
- Saurav Pahuja + 5 more
Auditory Attention Detection (AAD) is essential for developing advanced brain-computer interfaces including neuro-steered hearing technologies capable of functioning in complex auditory environments. In this study, we propose XAGnet, a novel method that leverages ear-centered EEG (ear-EEG) data to model both intra-ear and inter-ear neural dependencies for detection of auditory attention to one of the spatial locations. Specifically, Graph Convolutional Networks (GCNs) are applied separately to left and right ear-EEG signals to extract spatial features from each side for intra-ear interactions. A cross-attention mechanism is then introduced to model inter-ear interactions between the left and right ears. The attended features are combined for multi-class classification, with each class representing a speaker or a speaking location. We evaluate our method on a publicly available ear-EEG dataset, involving AAD tasks with four speakers. Experimental results demonstrate that XAGnet outperforms baseline models, highlighting the effectiveness of modeling both intra-ear and inter-ear dependencies in AAD tasks.
- Research Article
1
- 10.1007/s12070-025-05679-y
- Jun 14, 2025
- Indian Journal of Otolaryngology and Head & Neck Surgery
- Chunli Wang + 4 more
Research Progress on Auditory Attention Detection in EEG
- Research Article
4
- 10.1016/j.apacoust.2024.110474
- Mar 1, 2025
- Applied Acoustics
- Xuefei Wang + 3 more
A multi-task learning and auditory attention detection framework towards EEG-assisted target speech extraction
- Research Article
3
- 10.1016/j.neunet.2024.106977
- Mar 1, 2025
- Neural networks : the official journal of the International Neural Network Society
- Yawen Lan + 3 more
Low-power and lightweight spiking transformer for EEG-based auditory attention detection.
- Research Article
1
- 10.3758/s13415-024-01260-2
- Jan 16, 2025
- Cognitive, affective & behavioral neuroscience
- Joan Belo + 2 more
Focusing on a single source within a complex auditory scene is challenging. M/EEG-based auditory attention detection (AAD) allows to detect which stream an individual is attending to within a set of multiple concurrent streams. The high interindividual variability in the auditory attention detection performance often is attributed to physiological factors and signal-to-noise ratio of neural data. We hypothesize that executive functions-in particular sustained attention, working memory, and attentional inhibition-may partly explain the variability in auditory attention detection performance, because they support the cognitive processes required when listening to complex auditory scenes. We chose a particularly challenging auditory scene by presenting dichotically polyphonic classical piano excerpts that lasted 1min each. Two different excerpts were presented simultaneously, one in each ear. Forty-one participants, with different degrees of musical expertise, listened to these complex auditory scenes focusing on one ear while we recorded the EEG. Participants also completed several tasks assessing executive functions. As expected, EEG-based auditory attention detection was greater for attended than unattended stimuli. Importantly, attentional inhibition ability did explain 6% of the reconstruction accuracy and 8% of the classification accuracy. No other executive function was a significant predictor of reconstruction or classification accuracies. No clear effect of musical expertise was found on reconstruction and classification performance. In conclusion, cognitive factors seem to impact the robustness of the neural auditory representation and hence the performance of EEG-based decoding approaches. Taking advantage of this relation could be useful to improve next-generation hearing aids.
- Research Article
38
- 10.1109/tnnls.2023.3303308
- Dec 1, 2024
- IEEE transactions on neural networks and learning systems
- Siqi Cai + 2 more
Humans show a remarkable ability in solving the cocktail party problem. Decoding auditory attention from the brain signals is a major step toward the development of bionic ears emulating human capabilities. Electroencephalography (EEG)-based auditory attention detection (AAD) has attracted considerable interest recently. Despite much progress, the performance of traditional AAD decoders remains to be improved, especially in low-latency settings. State-of-the-art AAD decoders based on deep neural networks generally lack the intrinsic temporal coding ability in biological networks. In this study, we first propose a bio-inspired spiking attentional neural network, denoted as BSAnet, for decoding auditory attention. BSAnet is capable of exploiting the temporal dynamics of EEG signals using biologically plausible neurons and an attentional mechanism. Experiments on two publicly available datasets confirm the superior performance of BSAnet over other state-of-the-art systems across various evaluation conditions. Moreover, BSAnet imitates realistic brain-like information processing, through which we show the advantage of brain-inspired computational models.
- Research Article
2
- 10.3390/bioengineering11121216
- Nov 30, 2024
- Bioengineering
- Masoud Geravanchizadeh + 2 more
Attention is one of many human cognitive functions that are essential in everyday life. Given our limited processing capacity, attention helps us focus only on what matters. Focusing attention on one speaker in an environment with many speakers is a critical ability of the human auditory system. This paper proposes a new end-to-end method based on the combined transformer and graph convolutional neural network (TraGCNN) that can effectively detect auditory attention from electroencephalograms (EEGs). This approach eliminates the need for manual feature extraction, which is often time-consuming and subjective. Here, the first EEG signals are converted to graphs. We then extract attention information from these graphs using spatial and temporal approaches. Finally, our models are trained with these data. Our model can detect auditory attention in both the spatial and temporal domains. Here, the EEG input is first processed by transformer layers to obtain a sequential representation of EEG based on attention onsets. Then, a family of graph convolutional layers is used to find the most active electrodes using the spatial position of electrodes. Finally, the corresponding EEG features of active electrodes are fed into the graph attention layers to detect auditory attention. The Fuglsang 2020 dataset is used in the experiments to train and test the proposed and baseline systems. The new TraGCNN approach, as compared with state-of-the-art attention classification methods from the literature, yields the highest performance in terms of accuracy (80.12%) as a classification metric. Additionally, the proposed model results in higher performance than our previously graph-based model for different lengths of EEG segments. The new TraGCNN approach is advantageous because attenuation detection is achieved from EEG signals of subjects without requiring speech stimuli, as is the case with conventional auditory attention detection methods. Furthermore, examining the proposed model for different lengths of EEG segments shows that the model is faster than our previous graph-based detection method in terms of computational complexity. The findings of this study have important implications for the understanding and assessment of auditory attention, which is crucial for many applications, such as brain–computer interface (BCI) systems, speech separation, and neuro-steered hearing aid development.
- Research Article
4
- 10.7717/peerj-cs.2394
- Oct 30, 2024
- PeerJ. Computer science
- Tasleem Kausar + 6 more
Recent advances in auditory attention detection from multichannel electroencephalography (EEG) signals encounter the challenges of the scarcity of available online EEG data and the detection of auditory attention with low latency. To this end, we propose a complete deep auditory generative adversarial network auxiliary, named auditory-GAN, designed to handle these challenges while generating EEG data and executing auditory spatial detection. The proposed auditory-GAN system consists of a spectro-spatial feature extraction (SSF) module and an auditory generative adversarial network auxiliary (AD-GAN) classifier. The SSF module extracts the spatial feature maps by learning the topographic specificity of alpha power from EEG signals. The designed AD-GAN network addresses the need for extensive training data by synthesizing augmented versions of original EEG data. We validated the proposed method on the widely used KUL dataset. The model assesses the quality of generated EEG images and the accuracy of auditory spatial attention detection. Results show that the proposed auditory-GAN can produce convincing EEG data and achieves a significant i.e., 98.5% spatial attention detection accuracy for a 10-s decision window of 64-channel EEG data. Comparative analysis reveals that the proposed neural approach outperforms existing state-of-the-art models across EEG data ranging from 64 to 32 channels. The Auditory-GAN model is available at https://github.com/tasleem-hello/Auditory-GAN-/tree/Auditory-GAN.
- Research Article
17
- 10.1109/tcds.2024.3376433
- Oct 1, 2024
- IEEE Transactions on Cognitive and Developmental Systems
- Siqi Cai + 4 more
Decoding auditory attention from brain activities, such as electroencephalography (EEG), sheds light on solving the machine cocktail party problem. However, effective representation of EEG signals remains a challenge. One of the reasons is that the current feature extraction techniques have not fully exploited the spatial information along the EEG signals. EEG signals reflect the collective dynamics of brain activities across different regions. The intricate interactions among these channels, rather than individual EEG channels alone, reflect the distinctive features of brain activities. In this study, we propose a spiking graph convolutional network, called SGCN, which captures the spatial features of multi-channel EEG in a biologically plausible manner. Comprehensive experiments were conducted on two publicly available datasets. Results demonstrate that the proposed SGCN achieves competitive auditory attention detection (AAD) performance in low-latency and low-density EEG settings. As it features low power consumption, the SGCN has the potential for practical implementation in intelligent hearing aids and other BCIs.
- Research Article
3
- 10.1016/j.neuroscience.2024.09.017
- Sep 10, 2024
- Neuroscience
- Yixiang Niu + 4 more
Brain connectivity and time-frequency fusion-based auditory spatial attention detection
- Research Article
5
- 10.1016/j.heares.2024.109104
- Aug 14, 2024
- Hearing Research
- Yixiang Niu + 4 more
Subject-independent auditory spatial attention detection based on brain topology modeling and feature distribution alignment
- Research Article
28
- 10.1016/j.neunet.2024.106580
- Jul 26, 2024
- Neural Networks
- Cunhang Fan + 7 more
DGSD: Dynamical graph self-distillation for EEG-based auditory spatial attention detection
- Research Article
6
- 10.1109/embc53108.2024.10781617
- Jul 15, 2024
- Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
- Yuting Ding + 1 more
Auditory spatial attention detection (ASAD) is used to determine the direction of a listener's attention to a speaker by analyzing her/his electroencephalographic (EEG) signals. This study aimed to further improve the performance of ASAD with a short decision window (i.e., <1 s) rather than with long decision windows ranging from 1 to 5 seconds in previous studies. An end-to-end temporal attention network (i.e., TAnet) was introduced in this work. TAnet employs a multi-head attention (MHA) mechanism, which can more effectively capture the interactions among time steps in collected EEG signals and efficiently assign corresponding weights to those EEG time steps. Experiments demonstrated that, compared with the CNN-based method and recent ASAD methods, TAnet provided improved decoding performance in the KUL dataset, with decoding accuracies of 92.4% (decision window 0.1 s), 94.9% (0.25 s), 95.1% (0.3 s), 95.4% (0.4 s), and 95.5% (0.5 s) with short decision windows (i.e., <1 s). As a new ASAD model with a short decision window, TAnet can potentially facilitate the design of EEG-controlled intelligent hearing aids and sound recognition systems.